A fundamental primer on exploit development on both Windows and Linux based OS’s.

The classical classes of vulnerablilities:

  • buffer overflow
  • stack overflow
  • heap overflow
  • use after free
  • out of bounds read

Integer Overflow and NetBSD

Considered concrete example in the NetBSD kernel, based on an incorrect coding style that is exposed to integer overflow during input validation.

static int
set_cursor(struct tfb_softc *sc, struct wsdisplay_cursor *p)
{
#define cc (&sc->sc_cursor)
        u_int v, index = 0, count = 0, icount = 0;
        uint8_t r[2], g[2], b[2], image[512], mask[512];
        int error, s;

        v = p->which;
        if (v & WSDISPLAY_CURSOR_DOCMAP) {
                index = p->cmap.index;
                count = p->cmap.count;
                if (index >= 2 || (index + count) > 2)
+++ integer overflow
                        return (EINVAL);
                error = copyin(p->cmap.red, &r[index], count);
                if (error)
                        return error;
                error = copyin(p->cmap.green, &g[index], count);
                if (error)
                        return error;
                error = copyin(p->cmap.blue, &b[index], count);
                if (error)
                        return error;

Note the overflow, about 1/2 way down. Just imagine if index was a really large value that overflowed 32 bits. A more robust way to code the validation check, can be seen in the OpenBSD code:

if (index >= 256 || count > 256 - index)
    return (EINVAL);

While a bit strange at first glance, is rather elegant. If index is big, it bails. If index is big it also bails.

Static Analysis with Clang

Is the process of doing analysis of the program using techniques like data flow analysis, point-to analysis, common mis-uses of API’s, taint analysis and more. Clang, the LLVM compiler frontend for C/C++, has an incredible static analyzer, and can be invoked when kicking off make, like so:

scan-build make

TODO: run sample, and review HTML report.

Fuzzing

Fuzz testing (fuzzing) is a testing technique used to discover bugs in all types of software. It involves inputting massive amounts of random data, called fuzz, to the test subject in an attempt to make it crash.

Common types:

  • Dumb fuzz just randomly mutate input data, and measure crashes.
  • Generative fuzzing domain specific knowledge of target protocol or format (e.g. SSH handshake or PDF headers), fuzz specific fields (negative, max, all 9’s, …) and combinations of them together in the protocol, measuring crashes along the way.
  • Grammar based fuzzing involves building up a context-free grammar (CFG) using the Sequitur (or Nevill-Manning algorithm), based on test cases of a programs input. This could be an ELF binary representation made by readelf for example.
  • Smart fuzzing uses symbolic execution, to guide the control flow through a program intelligently so that all possible branches and permutations are exercised. This gives the fuzzer the ability to measure code coverage at runtime, and to focus in on certain branches that are less exposed.
  • Instrumented fuzzing places probes into the target binary through custom compilation. Using probe measurements allow for guided branch execution. Very effective in practice. The american fuzzy lop or afl takes this approach, and in practice can almost guarentee crash a program that has never been fuzzed.

To take afl for a spin, first install, on my archlinux box a quick pacman -Sy afl-utils afl did the job. The afl-gcc GCC wrapper instruments the produced binary and can optionally add a number of protection mechanisms such as stack cookies and source fortification. The actually fuzz workload is done with afl-fuzz, which given a seed file, brute fuzzes a variety of inputs. Take the following program:

#include <stdio.h>
#include <string.h>
int
main(int argc, char *argv[])
{
        char q[20];
        char s[1000];

        fgets(s, sizeof(s), stdin);
        printf("%s\n", s);
        if (s[0] == 'c') {
                strcpy(q, &s[1]);
                printf("%s\n", q);
        }
}

Hmm there’s a buffer overflow vulnerability, only if a certain branch of logic is exercised. Let’s see if afl can find it. First compile:

$ afl-gcc buggy.c -o buggy -D_FORTIFY_SOURCE=2 -fstack-protector-strong
afl-cc 2.52b by <lcamtuf@google.com>
[+] Instrumented 5 locations (64-bit, non-hardened mode, ratio 100%).

Now we have an instrumented binary, its fuzz time. afl works best with a seed file/s, it will use this as a basis to get up and running, and build upon it:

$ mkdir in && echo "123" > in/fuzztest_input
$ afl-fuzz -i in -o out ./buggy

afl will throw thousands of variations of input at the target binary:

                        american fuzzy lop 2.52b (buggy)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│        run time : 0 days, 0 hrs, 5 min, 40 sec       │  cycles done : 1580   │
│   last new path : 0 days, 0 hrs, 5 min, 39 sec       │  total paths : 2      │
│ last uniq crash : 0 days, 0 hrs, 5 min, 39 sec       │ uniq crashes : 1      │
│  last uniq hang : none seen yet                      │   uniq hangs : 0      │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│  now processing : 0 (0.00%)         │    map density : 0.00% / 0.01%         │
│ paths timed out : 0 (0.00%)         │ count coverage : 1.00 bits/tuple       │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│  now trying : havoc                 │ favored paths : 2 (100.00%)            │
│ stage execs : 139/256 (54.30%)      │  new edges on : 2 (100.00%)            │
│ total execs : 2.33M                 │ total crashes : 4213 (1 unique)        │
│  exec speed : 6965/sec              │  total tmouts : 0 (0 unique)           │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│   bit flips : 0/64, 0/62, 0/58                      │    levels : 2          │
│  byte flips : 0/8, 0/6, 0/2                         │   pending : 0          │
│ arithmetics : 0/448, 0/138, 0/35                    │  pend fav : 0          │
│  known ints : 0/33, 0/134, 0/71                     │ own finds : 1          │
│  dictionary : 0/0, 0/0, 0/0                         │  imported : n/a        │
│       havoc : 2/811k, 0/1.51M                       │ stability : 100.00%    │
│        trim : n/a, 0.00%                            ├────────────────────────┘
└─────────────────────────────────────────────────────┘          [cpu000: 52%]

In the output dir:

.
├── crashes
│   ├── id:000000,sig:06,src:000001,op:havoc,rep:16
│   └── README.txt
├── fuzz_bitmap
├── fuzzer_stats
├── hangs
├── plot_data
└── queue
    ├── id:000000,orig:fuzztest_input
    └── id:000001,src:000000,op:havoc,rep:32,+cov

We can see at least one crash has been detected:

# cat crashes/id\:000000\,sig\:06\,src\:000001\,op\:havoc\,rep\:16
c��������p��������
                  ��������p�

It starts with a c character, which we can see from the source code, gets into the branch of execution where the buffer overflow vuln lives. Then the other random 20+ bytes worth of data blow out the 20-byte buffer. The actual bytes themselves dont matter, and the output is better viewed with a hex viewer like xxd:

# xxd -b crashes/id\:000000\,sig\:06\,src\:000001\,op\:havoc\,rep\:16
00000000: 01100011 11110000 11110000 11110000 11100001 11110000  c.....
00000006: 11110000 00010000 11110000 11011101 01110000 11110000  ....p.
0000000c: 00000111 11110001 11110000 11110000 11110000 11110000  ......
00000012: 11110000 11110000 00001100 11110000 11110000 11110000  ......
00000018: 11110000 11110000 11110000 11110000 11011100 01110000  .....p
0000001e: 11110000                                               .

Running the target program with this input, should crash it:

# ./buggy < ./out/crashes/id\:000000\,sig\:06\,src\:000001\,op\:havoc\,rep\:16
c��������p��������
                  ��������p�
*** buffer overflow detected ***: ./buggy terminated
Aborted

Nice!

x86

Given a CPU architecture of your liking, from which there are many (e.g. MIPS, ARM, PowerPC, x64, etc), the key thing to remember is that they are all bounded by the laws of Turing, and more specifically the Von Neumann architecture. These first principles (e.g. instructions are obtained from memory or in otherwords software, hierarchy of memories and registers, an ALU for arithmetic based on binary representations, an instruction pointer that defines the instruction to be executed and so on) underpin all modern computing that takes place. If you like to go deeper, you can’t go past the nand2tetris 12 week program, in which you design and build a simple but fully featured 16-bit microprocessor literally starting from NAND gates.

Given this power, it is essential we work at the chip level using assembly. Higher level abstractions and constructs simply muddy the waters, not giving the fine grained instruction level control required. In other words, grab the assembly manual for the architecture of chip you plan to work with.

Assembly

Lots of resources out there, but these one pagers keep it simple:

Reverse string in EBX

section .text
	global _start

_start:
	mov eax,ebx
	jmp endcount

loop:
	inc ebx

endcount:
	cmp byte[ebx],0
	jne loop

reverse:
	cmp eax,ebx
	jg finish
	mov cl,[eax]
	mov ch,[ebx]
	mov [eax],ch
	mov [ebx],cl
	dec ebx
	inc eax
	jmp reverse

finish:

Debugging with GDB

The GNU debugger, the gold standard when it comes to debugging on Linux. Make sure the peda extension is installed, which add some insanely useful feature to GDB, like better colorisation and visual (still CLI!) display of memory and instructions, and commands like context, context_stack and jmpall to name but a few.

git clone https://github.com/longld/peda.git ~/peda
echo "source ~/peda/peda.py" >> ~/.gdbinit

Don’t forget help peda has always got your back.

GDB Cheatsheet

GDB is great, and supercharged with the peda module.

  • break foo set breakpoint of function foo
  • break *0xf7cd6ef8 set breakpoint on specific instruction
  • break main:66 set break on instruction that relates to source line
  • r run program (once all breaks in place)
  • c continue
  • stepi step next instruction
  • nexti
  • info reg dump out all registers
  • print "%d\n", 0xf7cd6edc printf style output of memory or registers
  • x/i foo to examine each instruction of foo function
  • x/s $esp print esp as string
  • x/xw $esp print esp as 32-bit (word) in hex

Exploitation Concepts

Stack mechanics

x86 for example.

NOP sled

ROP chain

Glossary

Term Meaning
Buffer A contiguous block of memory
DEP Data Execution Prevention, enforces a non-executable stack, putting a stop to most primitive buffer overflow attacks
NOP no op
ROP Return Oriented Programming, a clever technique for chaining existing assembly instructions together
shellcode code thats only goal in life is to facilitate providing an interactive shell
TFP0 Task for PID0, an exploit that focusing on comprimising the init system