Chapter 12. File Format Fuzzing: Automation on UNIX

 

“I’m the commander—see, I don’t need to explain—I do not need to explain why I say things. That’s the interesting thing about being president.”

 
 --George W. Bush, as quoted in Bob Woodward’s Bush at War

File format vulnerabilities can be exploited both client-side, as is the case with Web browsers and office suites, as well as server-side, as is the case with e-mail scanning antivirus gateways, for example. With regards to client-side exploitation, widespread usage of the affected client is directly related to the severity of the issue. An HTML parsing vulnerability affecting Microsoft Internet Explorer, for example, is among the most coveted file format vulnerabilities as far as severity is concerned. Adversely, client-side vulnerabilities limited to the UNIX platform are not as interesting due to limited exposure.

Nevertheless, in this chapter we introduce two fuzzers, notSPIKEfile and SPIKEfile, that implement mutation-based and generation-based file format fuzzing, respectively. We take a look at both the features and shortcomings of both tools. Then, diving further into the development process, we present the approach taken when developing the tools and present key code snippets. We also go over some basic UNIX concepts such as interesting and uninteresting signals and zombie processes. Finally, we cover the basic usage of the tool and explain the reasoning behind the chosen programming language used.

notSPIKEfile and SPIKEfile

The two tools developed to demonstrate file fuzzing on UNIX are named SPIKEfile and notSPIKEfile. As the names imply, they are based on SPIKE[1] and not based on SPIKE, respectively. The following list summarizes some of the key features provided in the implementations:

  • Integrated minimal debugger capable of detecting handled and unhandled signals and dumping both memory and register state.

  • Fully automatic fuzzing with user-specified delay before killing targeted process.

  • Two distinct databases of fuzz heuristics for targeting binary or ASCII printable data types.

  • Easily extensible fuzz heuristic database seeded with values historically known to cause problems.

What’s Missing?

There are several features that are missing from notSPIKEfile and SPIKEfile that could be useful to someone using the tools. What follows is a quick summary of these missing features:

  • Portability. Because the integrated debugger was created to work on x86 and uses Linux ptrace, it will not work out of the box on other architectures or operating systems. However, adding compatibility would be fairly trivial, provided the availability of other debugging and disassembling libraries.

  • Intelligent load monitoring. Although the user can specify how many processes to fuzz at once, it is currently completely up to the user to determine if system load is too high.

Clearly there were some compromises made during the development process, as demonstrated by the missing features. The author of the tool will utilize an age-old cover-up by stating that the improvements are left as “an exercise for the reader.” Let’s take a look at the development process undertaken during the construction of these tools.

Development Approach

The focus of this chapter is not how to use a fuzzer, but rather how to implement one. This section describes the design and development details of file format fuzzers under Linux. In the realm of file fuzzer development, we can make the safe assumption that our fuzzer will be executing on the same system as the target. Our design includes three distinct functional components.

Exception Detection Engine

This portion of code is responsible for determining when the target has demonstrated some undefined and potentially insecure behavior. How can this be done? There are two basic approaches. The first, which is fairly simple, involves simply monitoring any signals that are received by the application. This allows the fuzzer to detect, for example, buffer overflows that result in an invalid memory reference. This approach, of course, will not capture behavior such as metacharacter injection or logic flaws. For example, consider an application that passes a partially attacker-supplied value through to the UNIX system function. This would allow an attacker to execute arbitrary programs by using shell metacharacters, but would not cause any type of memory access violation. Although there are vulnerabilities that will allow an attacker to compromise host security that don’t involve memory corruption at all, a decision was made that we are not interested in such bugs, as the implementation of detecting such things would be more painful to develop than the payoff would be worth.

For those wishing to explore logic bugs such as these, a good approach would be to hook C library (LIBC) functions and monitor for fuzzer-supplied values being passed unsafely to calls such as open, creat, system, and so on.

For our design, we decided on simply detecting signals using the system’s ptrace debugging interface. Although it is trivial to determine that a signal has caused an application to terminate by simply waiting for the application to return, it requires a little more work to detect signals that are handled internally by the application. It is because of this that the approach taken for the exception detection engine relies heavily on the system’s debugging interface, in this case, the ptrace system call.

Exception Reporting (Exception Detection)

On discovering an exception, a good fuzzer should report useful information about what exactly happened. In the least, the signal that occurred should be reported. Ideally further details such as the offending instruction, CPU register states, and a stack dump would also be included in the report. Both of the fuzzers implemented in this chapter are capable of producing such detailed reports. To obtain this desired low-level information, we must employ the help of the ptrace system call to gather it. We must also leverage a library to disassemble instructions if we intend to report them to the user. The library should be capable of converting a buffer of arbitrary data into the string representation of x86 instructions. The library chosen was libdisasm[2] because it works well, exposes a simple interface, and appears to be the preferred choice of the Google search engine, which listed it above all others. To really seal the deal, libdisasm includes examples that we can cut and paste from, making our life that much easier.

Core Fuzzing Engine

This is the heart of the file format fuzzer as it controls the decisions on what malformed data to use and where to insert it. The code that does this for notSPIKEfile is different than the code for SPIKEfile because SPIKEfile leverages the already existent SPIKE code to implement this functionality. Rest assured, however, that they are fairly similar.

As you no doubt have already guessed, SPIKEfile utilizes the same fuzzing engine as SPIKE. The fuzzing engine requires a template, referred to as a SPIKE script, which describes the format of the file. SPIKE then “intelligently” produces variations on that format using combinations of valid and invalid data.

In notSPIKEfile, the fuzzing engine is much more limited. The user must provide the fuzzer with a valid target file. The engine then mutates various parts of that file using a database of fuzz values. These values are split into two types that are referred to as binary and string. Binary values can be of any length and have any value, but are typically used to represent the size of common integer fields. String values are, as the name implies, merely strings. These can contain long or short strings, strings with format specifiers, and all sorts of other exceptional values such as file paths, URLs, and any other type of distinct string you can think of. For a complete list of these types of values, see the SPIKE and notSPIKEfile source, but to get you started, Table 12.1 offers a brief list and a short explanation for several effective fuzz strings. This is nowhere near an exhaustive list, but it should help you get an idea about what types of inputs you need to consider.

Table 12.1. Some Common Fuzz Strings and Their Significance

String

Significance

“A”x10000

Long string, could cause buffer overflow

“%n%n”x5000

Long string with percent signs, could cause buffer overflow or trigger format string vulnerability

HTTP:// + “A”x10000

Valid URL format, could trigger buffer overflow in URL parsing code

“A”x5000 + “@” + “A”5000

Valid e-mail address format, could trigger buffer overflow in e-mail address parsing code

0x20000000,0x40000000, 0x80000000,0xffffffff

A few of the many integers that might trigger an integer overflow. You can get very creative here. Think of code that does malloc(user_count*sizeof (struct blah)); Also consider code that might increment or decrement integers without checking for overflows or underflows.

“../”x5000 + “AAAA”

Could trigger overflow in path or URL address parsing code

There is really no limit to the amount of fuzz strings you can use. The important thing to remember, though, is that any type of value that might require special parsing or might create an exceptional condition should be well represented. The difference in missing and finding a bug with your fuzzer might be due to something as simple as appending .html to the end of a large fuzz string.

In the next section, we explore some of the more interesting and relevant code excerpts from both SPIKEfile and notSPIKEfile.

Meaningful Code Snippets

Much of the core functionality is shared between both of the fuzzing tools being dissected here. For example, let’s begin by highlighting the basic method for forking off and tracing a child process. The following code excerpt is written in the C language and used by both fuzzers:

[...]
    if ( !(pid = fork ()) )
    { /* child */
         ptrace (PTRACE_TRACEME, 0, NULL, NULL);
         execve (argv[0], argv, envp);
    }
    else
    { /* parent */
          c_pid = pid;
monitor:
          waitpid (pid, &status, 0);
          if ( WIFEXITED (status) )
          { /* program exited */
               if ( !quiet )
          printf ("Process %d exited with code %d
", pid,WEXITSTATUS (status));
               return(ERR_OK);
          }
          else if ( WIFSIGNALED (status) )
          { /* program ended because of a signal */
          printf ("Process %d terminated by unhandled signal %d
", pid, WTERMSIG
(status));
          return(ERR_OK);
          }
          else if ( WIFSTOPPED (status) )
          { /* program stopped because of a signal */
               if ( !quiet )
                fprintf (stderr, "Process %d stopped due to signal %d (%s) ",
pid,WSTOPSIG (status), F_signum2ascii (WSTOPSIG (status)));
          }
          switch ( WSTOPSIG (status) )
          { /* the following signals are usually all we care about */
               case SIGILL:
               case SIGBUS:
               case SIGSEGV:
               case SIGSYS:
                    printf("Program got interesting signal...
");
                    if ( (ptrace (PTRACE_CONT, pid, NULL,(WSTOPSIG (status) ==SIGTRAP)
? 0 : WSTOPSIG (status))) == -1 )
                    {
                              perror("ptrace");
                    }
                    ptrace(PTRACE_DETACH,pid,NULL,NULL);
                    fclose(fp);
                    return(ERR_CRASH); /* it crashed */
          }
/* deliver the signal through and keep tracing */
          if ( (ptrace (PTRACE_CONT, pid, NULL,(WSTOPSIG (status) == SIGTRAP) ? 0 :
WSTOPSIG (status))) == -1 )
          {
               perror("ptrace");
          }
          goto monitor;
     }
     return(ERR_OK);
}

The main process, or parent, begins by forking off a new process for the target. The new process, or child, uses the ptrace call to indicate that it will be traced by its parent by issuing the PTRACE_TRACEME request. The child then continues on to execute the target knowing that its parent, like any good parent, will watch over it should the desire to do anything inappropriate arises.

As the parent process, the fuzzer is able to receive all signals that are destined for the child process because the child used the PTRACE_TRACEME request. The parent even receives a signal on any call to the exec family of functions. The parent loop is straightforward yet powerful. The fuzzer loops to receive every signal that is destined for the child. The fuzzer then behaves differently depending on the signal and the status of the child.

For example, if the process is stopped, this means that the program has not exited, but it has received a signal and is waiting for the parent’s discretion in allowing it to continue. If the signal is one indicative of a memory corruption issue, the fuzzer passes the signal through to the child, assuming it will kill the process and then report the results. If the signal is harmless in nature or is just plain uninteresting, the fuzzer passes it through to the application without any concern for monitoring.

The fuzzer also checks if the child process has actually terminated, another nonintersecting situation. What if the program crashed, you might ask? Shouldn’t we be very interested? Well, because we are intercepting all of the interesting signals before the application actually terminates, we know that this program terminated due to either natural behavior or due to an uninteresting signal. This highlights how important it is that you understand which signals are interesting to you. Some people might consider a floating point exception interesting if they are looking for DoS vulnerabilities. Others might be interested only in true memory corruption issues. Still others might look for abort signals, which have come to be an indicator for heap corruption in newer versions of GLIBC. It has been shown that in some circumstances these heap corruption checks can by worked around to execute arbitrary code.[3]

After seeing the code responsible for handling certain signals, there might be some question as to why we handle certain signals differently than others. What follows is an explanation of UNIX signals in the context of vulnerability research.

Usually Interesting UNIX Signals

Table 12.2 provides a list of signals that a vulnerability researcher might consider interesting during fuzzing, along with an explanation of why they are interesting.

Table 12.2. Interesting Signals When Conducting Fuzzing Under UNIX

Interesting Signal Name

Meaning

SIGSEGV

Invalid memory reference. The most common result of successful fuzzing.

SIGILL

Illegal instruction. This is a possible side effect of memory corruption, but relatively rare. It often results from the program counter becoming corrupted and landing in the middle of data or in between instructions.

SIGSYS

Bad system call. This is also a possible side effect of memory corruption, but relatively rare (actually, very rare). This can happen for the same reasons as SIGILL.

SIGBUS

Bus error. Often due to some form of memory corruption. Results from accessing memory incorrectly. More common with RISC machines due to their alignment requirements. On most RISC implementations, unaligned stores and loads will generate SIGBUS.

SIGABRT

Generated by the abort function call. This is often interesting because GLIBC will abort when it detects heap corruption.

Not So Interesting UNIX Signals

In contrast to Table 12.2, Table 12.3 describes signals that commonly occur during fuzzing, but are generally not interesting to a vulnerability researcher.

Table 12.3. Uninteresting Signals When Conducting Fuzzing Under UNIX

Uninteresting Signal Name

Meaning

SIGCHLD

A child process has exited.

SIGKILL, SIGTERM

Process was killed.

SIGFPE

Floating point exception, such as divide by zero.

SIGALRM

A timer expired.

Now that we have introduced SIGCHLD, it is an appropriate time to discuss the handling of a common scenario caused by not properly handling this signal. The next section explains what a zombie process is and how to properly handle child processes so that zombie processes are not created.

Zombie Processes

A zombie process is a process that has been forked from a parent process and has completed execution (i.e., exited) but its parent has not retrieved its status by calling either wait or waitpid. When this happens, information about the completed process is retained in the kernel indefinitely, until the parent process requests it. At that time, the information is released and the process is truly complete. The general life span of a process forked from our fuzzer is illustrated in Figure 12.1.

The lifetime of a forked process

Figure 12.1. The lifetime of a forked process

When writing a fuzzer that is spawning children using fork, one has to be sure that the parent receives all of the processes completions using wait or waitpid. If you miss a process completion, you will end up with zombie processes.

Earlier versions of notSPIKEfile had some bugs that over time led to the count of active processes slowly decreasing until the fuzz reached a deadlocked state. For example, assume the user specified to fuzz the application using eight processes at a time. As time went on, the active processes slowly dwindled down to just one. This was due to two careless errors on the author’s part. The original design relied entirely on the SIGCHLD signal, which is sent to the process when a child process has completed. Using this design, however, some SIGCHLD signals were being missed. There were also several places that the author carelessly failed to decrement the active processes count when child processes completed, which led to a gradual slowdown of the fuzzing process. Oops!

Once noticed, these bugs were trivial to address. All wait and waitpid calls that could cause problems when blocking were changed to be nonblocking using the WNOHANG flag. After they return, the status that is returned is checked to see if a process did indeed complete. If so, the active processes count is always decremented.

With SPIKEfile, there is no option to launch multiple applications at once, which greatly simplified the design and implementation. We did not need to worry about missing SIGCHLD signals because there would only be one coming back at a time.

Because SPIKEfile is based on SPIKE, we only needed to add a file or two into it so that it could handle file input and output instead of just network input and output. By taking a quick look at how SPIKE functioned for TCP/IP, it was simple to hack together file support. The following code is the trivial addition of filestuff.c:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "filestuff.h"
#include "spike.h"

extern struct spike *current_spike;

int
spike_fileopen (const char *file)
{
  int fd;
  if ((fd =
       open (file, O_CREAT | O_TRUNC | O_WRONLY,
             S_IRWXU | S_IRWXG | S_IRWXO)) == -1)
    perror ("fileopen::open");
  return current_spike->fd = fd;
  current_spike->proto = 69; /* 69==file,0-68 are reserved by the ISO fuzzing standard
*/
}

int
spike_filewrite (uint32 size, unsigned char *inbuffer)
{
  if (write (current_spike->fd, inbuffer, size) != size)
    {
      perror ("filewrite::write");
      return -1;
    }
  return 1;
}

void
spike_close_file ()
{
  if (current_spike->fd != -1)
    {
      close (current_spike->fd);
      current_spike->fd = -1;
    }
}

By adding this file to the Makefile, anyone who has used SPIKE can now use it just as easily with files as an endpoint. If you are interested in seeing exactly what files were added to SPIKE, Table 12.4 provides that information.

Table 12.4. List of Changes Made to SPIKE to Create SPIKEfile

Filename

Purpose

filestuff.c

Contains routines to open and write files.

util.c

Contains a lot of the shared code between notSPIKEfile and SPIKEfile. It contains ptrace wrappers, the main F_execmon function, and some other useful functions.

generic_file_fuzz.c

This is the main SPIKEfile source. It contains the main function.

include/filestuff.h

A header file for filestuff.c.

Libdisasm

The library used to disassemble x86 instructions when something crashes.

Usage Notes

If you plan to use SPIKEfile or notSPIKEfile with an application that cannot be launched directly, you need to work around it in some way. Good examples of these types of applications include Adobe Acrobat Reader and RealNetworks RealPlayer.

Normally, with these applications, the program you actually run is a shell script wrapper. The shell script sets up the environment so that the true binary can run properly. This is done mainly so that the applications can include copies of their own shared libraries. Including their own copies of common shared libraries allows the product to be more portable. Although this is done as a courtesy to the user to make things easier, for our purposes, it makes things a little more annoying. For example, if we specify the shell script as the file to fuzz, we will not be attached to the actual binary as it runs, we will be attached to an instance of the shell. This will make our signal catching worthless. Here is an example of how to get around this for Acrobat Reader and RealPlayer. The idea is similar for all other applications like it.

Adobe Acrobat

For Acrobat, you must simply first run acroread using the –DEBUG option. This will drop you to a shell with the correct environment set to directly invoke the real acroread binary, which is often in $PREFIX/Adobe/Acrobat7.0/Reader/intellinux/bin/acroread. Although this isn’t documented and there is no usage function, we were able to determine this information by simply reading the acroread shell script. You can now fuzz this binary with no problems. This method was used along with notSPIKEfile to discover the Adobe Acrobat Reader UnixAppOpenFilePerform Buffer Overflow Vulnerability.[4]

RealNetworks RealPlayer

As shown by the following output, the realplay command is actually a shell script. We also see that the real binary application is called realplay.bin.

user@host RealPlayer $ file realplay realplay.bin
realplay:     Bourne shell script text executable
realplay.bin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux
2.2.5, dynamically linked (uses shared libs), stripped

For RealPlayer, you only need to set the shell environment variable HELIX_PATH to the path of the RealPlayer installation. You can then fuzz the binary realplay.bin directly, which is included with RealPlayer. This information was determined by simply reading the realplay shell script file. Using this method along with notSPIKEfile, the RealNetworks RealPlayer/HelixPlayer RealPix Format String Vulnerability[5] was discovered.

Case Study: RealPlayer RealPix Format String Vulnerability

We now describe how notSPIKEfile was used to discover a vulnerability in RealPlayer that was patched in September 2005. The first step is to arbitrarily choose a file format that RealPlayer supports. Of course in this case, the RealPix format was chosen. After a great deal of Googling, several sample RealPix files were compiled and used as base files for notSPIKEfile fuzzing. One such stripped-down example follows:

<imfl>
    <head title="RealPix(tm) Sample Effects"
      author="Jay Slagle"
      copyright="(c)1998 RealNetworks, Inc."
      timeformat="dd:hh:mm:ss.xyz"
      duration="46"
      bitrate="12000"
      width="256"
      height="256"
      url="http://www.real.com"
      aspect="true"/>
</imfl>[6]

This is a barebones example of a very minimal RealPlayer file. If you load this in RealPlayer, nothing will display, as it only contains a header. We will be running this through notSPIKEfile to test the header parsing code for bugs. We use the following command to start the fuzzing.

user@host $ export HELIX_PATH=/opt/RealPlayer/
user@host $ ./notSPIKEfile -t 3 -d 1 -m 3 -r 0- -S -s SIGKILL -o FUZZY-sample1.rp
sample1.rp "/opt/RealPlay/realplay.bin %FILENAME%"
[...]
user@host $

We tell the tool to let each invocation of RealPlayer last three seconds using the -t option. We also tell it to wait one second between the time it kills an idle process and the time it starts a new one using the -d option. We specify the launching of three concurrent instances of realplayer using the -m options, and tell the tool to fuzz the whole file starting at byte zero using the -r option. We also specify string fuzzing mode using -S and specify the SIGKILL signal to terminate idle processes using the -s option. Finally, we tell the tool a format for fuzzed file names and specify the filename of our sample file, sample1.rp, and tell the tool how to execute RealPlayer so that it parses our file. We are ready to go! The output from notSPIKEfile suggests we have found some sort of a vulnerability by reporting several crashes.

We list the files in the current directory and see that the file FUZZY-sample1.rp-0x28ab156b-dump.txt has been created. When we view this file, we see a verbose report created to summarize the process state at the time of the crash. We also see the name of the file that caused the crash. In this case, it has been saved as 12288-FUZZY-sample1.rp. This file can be used to re-create the crash. When viewing the file, we get a good tip as to what the issue might be. The files contents are as follows:

<imfl>
    <head title="RealPix(tm) Sample Effects"
      author="Jay Slagle"
      copyright="(c)1998 RealNetworks, Inc."
      timeformat="%n%n%n%n%n%n%n%n%n%n%n%ndd:hh:mm:ss.xyz"
      duration="46"
      bitrate="12000"
      width="256"
      height="256"
      url="http://www.real.com"
      aspect="true"/>
</imfl>

Due to the presence of the %n characters, we immediately suspect a format string vulnerability as the cause of the crash. Our suspicions are confirmed when we launch the RealPlayer binary in GDB.

user@host ~/notSPIKEfile $ gdb -q /opt/RealPlayer/realplay.bin
Using host libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) r 12288-FUZZY-sample1.rp
Starting program: /opt/RealPlayer/realplay.bin 12288-FUZZY-sample1.rp

Program received signal SIGSEGV, Segmentation fault.
0xb7e53e67 in vfprintf () from /lib/tls/libc.so.6
(gdb) x/i $pc
0xb7e53e67 <vfprintf+13719>:    mov    %ecx,(%eax)

We have indeed discovered a format string vulnerability in RealPlayer, in the handling of the timeformat option. It is now left as an exercise for you to create an exploit for the vulnerability.

Language

These tools were written in C for several logical reasons. First, like salt and pepper, C and Linux always have and always will work well with another. Every modern Linux distribution has a C compiler and probably always will due to the fact that the Linux kernel is written in C. There are no special libraries required to expose the ptrace interface to our application, and we have all of the freedom to perform the tasks that we need.

Another reason the tools were written in C is because SPIKE is written in C. Because at least one of the tools uses the SPIKE code extensively, and because these two tools ideally needed to share some code, such as the exception handling and reporting, it would have been foolish to implement the common functionality in two different languages.

Summary

With reliable tools to fuzz file formats, client-side vulnerability discovery becomes just a matter of choosing a target and being patient. Whether you choose to write your own fuzzer or extend someone else’s, the time invested is well worth it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.77.21