Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11. File Format Fuzzing

	“If this were a dictatorship, it’d be a heck of a lot easier, just so long as I’m the dictator.”
	--George W. Bush, Washington, DC, December 19, 2000

File format fuzzing is a specialized fuzzing method with specifically defined targets. These targets are usually client-side applications. Examples include media players, Web browsers, and office productivity suites. However, targets can also be servers, such as antivirus gateway scanners, spam filters, and even regular e-mail servers. The end goal of file format fuzzing is to find an exploitable flaw in the way that an application parses a certain type of file.

An impressive number of client-side file format parsing vulnerabilities were uncovered in 2005 and 2006, many by nefarious parties as a number of 0day exploits were discovered in the wild prior to the typical vulnerability disclosure process. The eEye security research group does an excellent job detailing such exposures in their Zero-Day Tracker.^[1] There are a number of factors indicating that the majority of these discoveries were uncovered through file format fuzzing. This class of bugs is far from extinct, making file format fuzzing a very interesting and “hot” topic.

In this chapter, we present various methods of approaching file fuzzing, as well as talk about the different ways certain targets will accept input. Finally, we demonstrate some common vulnerabilities a file fuzzer will encounter and suggest ways of detecting such vulnerabilities in practice. The first step of course, is to choose a suitable target.

Targets

Just like traditional types of fuzzing, many different types of vulnerabilities can be found with file format fuzzing. There are also many different types of exploitation scenarios. For example, some situations will require an attacker to send a malicious file to a user and have him or her open it manually. Other situations will only require a user browsing to an attacker-controlled Web page. Finally, some situations can be triggered by simply sending a malicious e-mail through a mail server or antivirus gateway. This last scenario was the case with the Microsoft Exchange TNEF vulnerability mentioned in Table 11.1 along with other file format vulnerability examples.

Table 11.1. Common Types of Vulnerable Applications and Examples of Previously Discovered File Format Vulnerabilities

Application Category	Vulnerability Name	Advisory
Office productivity suites	Microsoft HLINK.DLL Hyperlink Object Library Buffer Overflow Vulnerability	http://www.tippingpoint.com/security/advisories/TSRT-06-10.html
Antivirus scanners	Kaspersky Anti-Virus Engine CHM File Parser Buffer Overflow Vulnerability	http://www.idefense.com/intelligence/vulnerabilities/display.php?id=318
Media players	Winamp m3u Parsing Stack Overflow Vulnerability	http://www.idefense.com/intelligence/vulnerabilities/display.php?id=377
Web browsers	Vulnerability in Vector Markup Language Could Allow Remote Code Execution	http://www.microsoft.com/technet/security/Bulletin/MS06-055.mspx
Archiving utilities	WinZip MIME Parsing Buffer Overflow Vulnerability	http://www.idefense.com/intelligence/vulnerabilities/display.php?id=76
E-mail servers	Microsoft Exchange TNEF Decoding Vulnerability	http://www.microsoft.com/technet/security/Bulletin/MS06-003.mspx

You will find that most targets will fit into one of these categories. Some applications fit into several categories by way of their secondary functions. For example, many antivirus scanners will also include libraries to decompress files, allowing them to act as archiving utilities. There are also some content scanners that claim to analyze image files for pornographic content. These programs can also be considered as image viewers!^[2] It is not uncommon for applications to share common libraries, in which case a single vulnerability can affect multiple applications. Consider, for example, the vulnerability detailed in Microsoft Security Bulletin MS06-055, which affects both Internet Explorer and Outlook.

Methods

File format fuzzing is different than other types of fuzzing in that it is typically performed entirely on one host. When conducting Web application or network protocol fuzzing, you will most likely have at least two systems, a target system and a system on which your fuzzer will run. The increased performance achieved by being able to fuzz on a single machine makes file format fuzzing a particularly attractive approach for vulnerability discovery.

With network-based fuzzing, it is often evident when an interesting condition has occurred in the target application. In many cases, the server will shut down or crash outright and no longer be reachable. With file format fuzzing, mainly when fuzzing client-side applications, the fuzzer will be continually restarting and killing the target application so a crash might not be recognizable to the fuzzer without proper monitoring. This is an area where file format fuzzing is more complex than network fuzzing. With file format fuzzing, the fuzzer will generally have to monitor the target application for exceptions with each execution. This is generally accomplished by using a debugging library to dynamically monitor handled and unhandled exceptions in the target application, logging the results for later review. At the 50,000-foot view, a typical file fuzzer will follow these steps:

Prepare a test case, either via mutation or generation (more on this later).
Launch the target application and instruct it to load the test case.
Monitor the target application for faults, typically with a debugger.
In the event a fault is uncovered, log the finding. Alternatively, if after some period of time no fault is uncovered, manually kill the target application.
Repeat.

File format fuzzing can be implemented via both generation and mutation methods. Although both methods have been very effective in our experiences, the mutation or “brute force” method is definitely the simpler to implement. The generation method or “intelligent brute force” fuzzing, although more time consuming to implement, will uncover vulnerabilities that would otherwise not be found using the more primitive brute force approach.

Brute Force or Mutation-Based Fuzzing

With the brute force fuzzing method, you need to first collect several different samples of your target file type. The more different files you can find, the more thorough your test will be. The fuzzer then acts on these files, creating mutations of them and sending them through the target applications parser. These mutations can take any form, depending on the method you choose for your fuzzer. One method you can use is to replace data byte for byte. For example, progress through the file and replace each byte with 0xff. You could also do this for multiple bytes, such as for two- and four-byte ranges. You can also insert data into the file as opposed to just overwriting bytes. This is a useful method when testing string values. However, when inserting data into the file, be aware that you might be upsetting offsets within the file. This can severely disrupt code coverage, as some parsers will quickly detect an invalid file and exit.

Checksums can also foil brute force parsers. Due to the fact that any byte change will invalidate the checksum, it is quite likely that the parsing application will gracefully exit, providing an error message before a potentially vulnerable piece of code can ever be reached. The solution for these problems is to either switch to intelligent fuzzing, which is discussed in the next section, or as an alternative approach, disable the checks within the target software. Disabling the software checks is not a trivial task and generally requires the efforts of a reverse engineer.

Why is this method simple to use once it is implemented? That’s easy. The end user doesn’t need to have any knowledge of the file format and how it works. Provided they can find a few sample files using a popular search engine, or by searching their local system, they are essentially done with their research until the fuzzer finds something interesting.

There are a few drawbacks this fuzzing approach. First, it is a very inefficient approach and can therefore take some time to complete fuzzing on a single file. Take, for example, a basic Microsoft Word document. Even a blank document will be approximately 20KB in size. To fuzz each byte once would require creating and launching 20,480 separate files. Assuming 2 seconds per file, it would take more than 11 hours to complete and that’s only for trying a single byte value. What about the other 254 possibilities? This issue can be sidestepped somewhat through the usage of a multithreaded fuzzer, but it does illustrate the inefficiency of pure mutation fuzzing. Another way to streamline this fuzzing approach is to solely concentrate on areas of the file that are more likely to yield desired results, such as file and field headers.

The primary drawback to brute force fuzzing is the fact that there will almost always be a large piece of functionality that will be missed, unless you have somehow managed to gather a sample file set containing each and every possible feature. Most file formats are very complex and contain a multitude of permutations. When measuring code coverage, you will find that throwing a few sample files at an application will not exercise the application as thoroughly as if the user truly understands the file format and has manually prepared some of the information about the file type. This thoroughness issue is addressed with the generation approach to file fuzzing, which we have termed intelligent brute force fuzzing.

Intelligent Brute Force or Generation-Based Fuzzing

With intelligent brute force fuzzing, you must first put some effort into actually researching the file specifications. An intelligent fuzzer is still a fuzzing engine, and thus is still conducting a brute force attack. However, it will rely on configuration files from the user, making the process more intelligent. These files usually contain metadata describing the language of the file types. Think of these templates as lists of data structures, their positions relative to each other, and their possible values. On an implementation level, these can be represented in many different formats.

If a file format without any public documentation is chosen for testing, you, as the researcher, will have to conduct further research on the format specification before building a template. This might require reverse engineering on your part, but always start with your good friend Google to see if someone else has done the work for you. Several Web sites, such as Wotsit’s Format,^[3] serve as an excellent archive of official and unofficial file format documentation. An alternate but complementary approach involves comparing samples of the file type to reveal some patterns and profile some of the data types being used. Remember that the effectiveness of an intelligent fuzz is directly related to your understanding of the file format and your ability to describe it in a generic way to the fuzzer you are using. We show a sample implementation of an intelligent brute force fuzzer later on in the book when building SPIKEfile in Chapter 12, “File Format Fuzzing: Automation on UNIX.”

Once a target and method have been determined, the next step is to research appropriate input vectors for the chosen target.

Inputs

With a target application selected, the next step is to enumerate the supported file types and extensions as well as the different vectors for getting those files parsed. Available format specifications should also be collected and reviewed. Even in cases where you only intend to perform a simple brute force test, it is still useful to have knowledge of the file formats you have as possible candidates. Focusing on the more complex file types can be lucrative, as implementing a proper parser will be more difficult, and therefore the chances of discovering a vulnerability arguably increase.

Let’s consider an example and see how we might gather inputs. The archive utility WinRAR^[4] is a popular archive utility that is freely available. An easy way to tell what files WinRAR will handle is to simply browse the WinRAR Web site. On the main WinRAR page, you will find a list of supported file types. These include zip, rar, tar, gz, ace, uue, and several others.

Now that you have a list of the file types that WinRAR will handle, you must pick a target. Sometimes, the best way to pick a target is to look up information about each file type, and go with the one that is most complex. The assumption here is that complexity often leads to coding mistakes. For example, a file type that uses a number of length tagged values and user-supplied offsets might be more appealing than a simpler file type that is based on static offsets and static length fields. Of course, there are plenty of exceptions to this rule as you will find once you get some fuzzing under your belt. Ideally, the fuzzer will eventually target every possible file type; the first one chosen is not necessarily important, however it is always a nice payoff to find interesting behavior in your first set of fuzz tests for a particular application.

Vulnerabilities

When parsing malformed files, a poorly coded application can be susceptible to a number of different classes of vulnerabilities. This section discusses some of these vulnerability classifications:

DoS (crash or hang)
Integer handling problems
Simple stack/heap overflows
Logic errors
Format strings
Race conditions

Denial of Service

Although DoS issues are not very interesting in client-side applications, you need to keep in mind that we can also target server applications that must remain available for security and productivity purposes. This includes, of course, e-mail servers and content filters. Some of the most common causes of DoS issues in file parsing code in our experience have been out of bound reads, infinite loops, and NULL pointer dereferences.

A common error leading to infinite loops is trusting offset values in files that specify the locations of other blocks within the file. If the application does not make sure this offset is forward in relation to the current block, an infinite loop can occur causing the application to repeatedly process the same block or blocks ad infinitum. There have been several instances of this type of problem in ClamAV in the past.^[5]

Integer Handling Problems

Integer overflows and “signedness” issues are very common in binary file parsing. Some of the most common issues we have seen resemble the following pseudo-code:

[...]
[1] size            = read32_from_file();
[2] allocation_size = size+1;
[3] buffer          = malloc(allocation_size);
[4] for (ix = 0; ix < size; ix++)
[5]      buffer[ix] = read8_from_file();
[...]

This example demonstrates a typical integer overflow that results in a memory corruption. If the file specifies the maximum unsigned 32-bit integer (0xFFFFFFFF) for the value size, then on line [2] allocation_size gets assigned as zero due to an integer wrap. On line [3], the code will result in a memory allocation call with a size of zero. The pointer buffer at this stage will points to an underallocated memory chunk. On lines [4] and [5], the application loops and copies a large amount of data, bounded by the original value for size, into the allocated buffer, resulting in a memory corruption.

This particular situation will not always be exploitable. Its exploitability is dependent on how the application uses the heap. Simply overwriting memory on the heap is not always enough to gain control of the application. Some operation must occur causing the overwritten heap data to be used. In some cases, integer overflows like these will cause a non-heap-related crash before heap memory is used.

This is, of course, just one example of how integers can be used incorrectly while parsing binary data. We have seen integers misused in many different ways, including the often simpler signed to unsigned comparison error. The following code snippet demonstrates the logic behind this type of vulnerability:

[0] #define MAX_ITEMS 512

[...]

[1] char buff[MAX_ITEMS]
[2] int size;

[...]

[3] size = read32_from_file();
[4] if (size > MAX_ITEMS)
[5]    { printf("Too many items
");return -1; }
[6] readx_from_file(size,buff);

[...]

/* readx_from_file: read 'size' bytes from file into buff */
[7] void readx_from_file(unsigned int size, char *buff)
{
[...]
}

This code will allow a stack-based overflow to occur if the value size is a negative number. This is because in the comparison at [4], both size (as defined on [1]) and MAX_ITEMS (as defined on [0]) are treated as signed numbers and, for example, -1 is less than 512. Later on, when size is used for copy boundaries in the function at [7], it is treated as unsigned. The value -1, for example, now is interpreted as 42949672954294967295. Of course, the exploitability of this is not guaranteed, but in many cases depending on how the readx_from_file function is implemented, this will be exploitable by targeting variables and saved registers on the stack.

Simple Stack and Heap Overflows

The issues here are well understood and have been seen many times in the past. A typical scenario goes like this: A fixed size buffer is allocated, whether it be on the stack or on the heap. Later, no bounds checking is performed when copying in oversized data from the file. In some cases, there is some attempt at bounds checking, but it is done incorrectly. When the copy occurs, memory is corrupted, often leading to arbitrary code execution. For more details regarding these vulnerability classes, “The Shellcoder’s Handbook: Discovering and Exploiting Security Holes”^[6] serves as an excellent reference.

Logic Errors

Depending on the design of the file format, exploitable logic errors might be possible. Although we have not personally discovered any logic errors during file format vulnerability research, a perfect example of this class of vulnerabilities is the Microsoft WMF vulnerability addressed in MS06-001.^[7] The vulnerability was not due to a typical overflow. In fact, it did not require any type of memory corruption, yet it allowed an attacker to directly execute user-supplied position-independent code.

Format Strings

Although format string vulnerabilities are mostly extinct, especially in open source software, they are worth mentioning. When we say mostly extinct, we say that because not all programmers can be as security aware as the folks at US-CERT, who recommend that to secure your software, you should Not Use the "%n" Format String Specifier.^[8]

But seriously, in our personal experiences, we have actually found several format string-related issues while file fuzzing. Some were discovered in Adobe^[9] and RealNetworks^[10] products. A lot of the fun in exploiting format string issues comes from being able to use the vulnerability to leak memory to aid exploitation. Unfortunately, with client-side attacks using malformed files, you rarely are afforded this opportunity.

Race Conditions

Although people don’t typically think of file format vulnerabilities occurring due to race conditions, there have been a few that do and there are probably many more to come. The main targets for this type of vulnerability are complex multithreaded applications. We hate to target just one product in specific, but Microsoft Internet Explorer is the first application that comes to mind here. Vulnerabilities caused by Internet Explorer using uninitialized memory and using memory that is in use by another thread will probably continue to be discovered.

Detection

When fuzzing file formats you will typically be spawning many instances of the target application. Some will hang indefinitely and have to be killed, some will crash, and some will exit cleanly on their own. The challenge lies in determining when a handled or unhandled exception has occurred and when that exception is exploitable. A fuzzer can utilize several sources of information to find out more information about a process:

Event logs. Event logs are used by Microsoft Windows operating systems and can be accessed using the Event Viewer application. They are not terribly useful for our purposes, as it is difficult to correlate an event log entry with a specific process when we are launching hundreds during a fuzz session.
Debuggers. The best way to identify unhandled and handled exceptions is to attach a debugger to the target application prior to fuzzing. Error handling will prevent obvious signs of many errors caused by fuzzing but these can generally be detected using a debugger. There are more advanced techniques than applying a debugger for fault detection, some of which are touched on in Chapter 24, “Intelligent Fault Detection.”
Return codes. Capturing and testing the return code of the application, although not always as accurate or informative as using a debugger, can be a very quick and dirty way to determine why an application ended. Under UNIX at least, it is possible to determine which signal caused an application to terminate via the return code.
Debugging API. Instead of leveraging a third-party debugger, it is often feasible and effective to implement some rudimentary debugging features into the fuzzer itself. For example, all we will be interested in knowing about a process is the reason it terminated, what the current instruction was, the register state, and possibly the values in a few memory regions like at the stack pointer or with respect to some register. This is often trivial to implement, and is invaluable in terms of time saving when analyzing crashes for exploitability. In later chapters, we explore this option and present a simple and reusable debugger creation framework on the Microsoft Windows platform named PyDbg, part of the PaiMei^[11] reverse engineering framework.

Once a given test case has been determined to cause a fault, make sure to save the information that was gathered by whatever fault monitoring method you choose in addition to the actual file that triggered the crash. Saving test case metadata in your records is important as well. For example, if the fuzzer was fuzzing the 8th variable field in the file and it was using the 42nd fuzz value, the file might be named file-8-42. In some cases, we might want to drop a core file and save that as well. This can be done if the fuzzer is catching signals using a debugging API. Specific implementation details regarding this can be found in the next two chapters.

Summary

Although file format fuzzing is a narrowly defined fuzzing method, there are plenty of targets and numerous attack vectors. We have discussed not only the more traditional client-side file format vulnerabilities, but even some “true remote” scenarios, such as antivirus gateways and mail servers. As more and more emphasis is placed on preventing and detecting network-based attacks over TCP/IP, file format exploits still remain as a valuable weapon to penetrate internal network segments.

Now that you know what type of vulnerabilities you are looking for, what applications you are targeting, and exactly how you are trying to fuzz them, it is time to explore some implementations of file format fuzzing tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 11. File Format Fuzzing

Create new playlist

Sign In

Sign Up