Chapter 9

Format Strings

Introduction

Early in the summer of 2000, the security world was abruptly made aware of a significant new type of security vulnerabilities in software. This subclass of vulnerabilities, known as format string bugs, was made public when an exploit for the Washington University FTP daemon (WU-FTPD) was posted to the Bugtraq mailing list on June 23, 2000. The exploit allowed for remote attackers to gain root access on hosts running WU-FTPD without authentication if anonymous FTP was enabled (it was, by default, on many systems). This was a very high-profile vulnerability because WU-FTPD is in wide use on the Internet.

As serious as it was, the fact that tens of thousands of hosts on the Internet were instantly vulnerable to complete remote compromise was not the primary reason that this exploit was such a great shock to the security community. The real concern was the nature of the exploit and its implications for software everywhere. This was a completely new method of exploiting programming bugs previously thought to be benign. This was the first demonstration that format string bugs were exploitable.

A format string vulnerability occurs when programmers pass externally supplied data to a printf function as or as part of the format string argument. In the case of WU-FTPD, the argument to the SITE EXEC ftp command when issued to the server was passed directly to a printf function.

There could not have been a more effective proof of concept; attackers could immediately and automatically obtain superuser privileges on victim hosts.

Until the exploit was public, format string bugs were considered by most to be bad programming form—just inelegant shortcuts taken by programmers in a rush—nothing to be overly concerned about. Up until that point, the worst that had occurred was a crash, resulting in a denial of service. The security world soon learned differently. Countless UNIX systems have been compromised due to these bugs.

As previously mentioned, format string vulnerabilities were first made public in June of 2000. The WU-FTPD exploit was written by an individual known as tf8, and was dated October 15, 1999. Assuming that through this vulnerability it was discovered that format string bug conditions could be exploited, hackers had more than eight months to seek out and write exploits for format string bugs in other software. This is a conservative guess, based on the assumption that the WU-FTPD vulnerability was the first format string bug to be exploited. There is no reason to believe that is the case; the comments in the exploit do not suggest that the author discovered this new method of exploitation.

Shortly after knowledge of format string vulnerabilities was public, exploits for several programs became publicly available. As of this writing, there are dozens of public exploits for format string vulnerabilities, plus an unknown number of unpublished ones.

As for their official classification, format string vulnerabilities do not really deserve their own category among other general software flaws such as race conditions and buffer overflows. Format string vulnerabilities really fall under the umbrella of input validation bugs: the basic problem is that programmers fail to prevent untrusted externally supplied data from being included in the format string argument.

Notes from the Underground …

Format String Vulnerabilities versus Buffer Overflows

On the surface, format string and buffer overflow exploits often look similar. It is not hard to see why some may group together in the same category. Whereas attackers may overwrite return addresses or function pointers and use shellcode to exploit them, buffer overflows and format string vulnerabilities are fundamentally different problems.

In a buffer overflow vulnerability, the software flaw is that a sensitive routine such as a memory copy relies on an externally controllable source for the bounds of data being operated on. For example, many buffer overflow conditions are the result of C library string copy operations. In the C programming language, strings are NULL terminated byte arrays of variable length. The strcpy() (string copy) libc function copies bytes from a source string to a destination buffer until a terminating NULL is encountered in the source string. If the source string is externally supplied and greater in size than the destination buffer, the strcpy() function will write to memory neighboring the data buffer until the copy is complete. Exploitation of a buffer overflow is based on the attacker being able to overwrite critical values with custom data during operations such as a string copy.

In format string vulnerabilities, the problem is that externally supplied data is being included in the format string argument. This can be considered a failure to validate input and really has nothing to do with data boundary errors. Hackers exploit format string vulnerabilities to write specific values to specific locations in memory. In buffer overflows, the attacker cannot choose where memory is overwritten.

Another source of confusion is that buffer overflows and format string vulnerabilities can both exist due to the use of the sprintf() function. To understand the difference, it is important to understand what the sprintf function actually does. sprintf() allows for a programmer to create a string using printf() style formatting and write it into a buffer. Buffer overflows occur when the string that is created is somehow larger than the buffer it is being written to. This is often the result of the use of the %s format specifier, which embeds NULL terminated string of variable length in the formatted string. If the variable corresponding to the %s token is externally supplied and it is not truncated, it can cause the formatted string to overwrite memory outside of the destination buffer when it is written. The format string vulnerabilities due to the misuse of sprintf() are due to the same error as any other format string bugs, externally supplied data being interpreted as part of the format string argument.

This chapter will introduce you to format string vulnerabilities, why they exist, and how they can be exploited by attackers. We will look at a real-world format string vulnerability, and walk through the process of exploiting it as a remote attacker trying to break into a host.

Understanding Format String Vulnerabilities

To understand format string vulnerabilities, it is necessary to understand what the printf functions are and how they function internally.

Computer programmers often require the ability for their programs to create character strings at runtime. These strings may include variables of a variety of types, the exact number and order of which are not necessarily known to the programmer during development. The widespread need for flexible string creation and formatting routines naturally lead to the development of the printf family of functions. The printf functions create and output strings formatted at runtime. They are part of the standard C library. Additionally, the printf functionality is implemented in other languages (such as Perl).

These functions allow for a programmer to create a string based on a format string and a variable number of arguments. The format string can be considered a blueprint containing the basic structure of the string and tokens that tell the printf function what kinds of variable data goes where, and how it should be formatted. The printf tokens are also known as format specifiers; the two terms are used interchangeably in this chapter.

Tools & Traps …

The printf Functions

This is a list of the standard printf functions included in the standard C library. Each of these can lead to an exploitable format string vulnerability if misused.

image printf() This function allows a formatted string to be created and written to the standard out I/O stream.

image fprintf() This function allows a formatted string to be created and written to a libc FILE I/O stream.

image sprintf() This function allows a formatted string to be created and written to a location in memory. Misuse of this function often leads to buffer overflow conditions.

image snprintf() This function allows a formatted string to be created and written to a location in memory, with a maximum string size. In the context of buffer overflows, it is known as a secure replacement for sprintf().

The standard C library also includes the vprintf(), vfprintf(), vsprintf(), and vsnprintf() functions. These perform the same functions as their counterparts listed previously but accept varargs (variable arguments) structures as their arguments.

The concept behind printf functions is best demonstrated with a small example:

image

In this code example, the programmer is calling printf with two arguments, a format string and a variable that is to be embedded in the string when that instance of printf executes.

“this is the skeleton of the string, %i”

This format string argument consists of static text and a token (%i), indicating variable data. In this example, the value of this integer variable will be included, in Base10 character representation, after the comma in the string output when the function is called.

The following program output demonstrates this (the value of the integer variable is 10):

[dma@victim server]$ ./format_example

this is the skeleton of the string, 10

Because the function does not know how many arguments it will receive, they are read from the process stack as the format string is processed based on the data type of each token. In the previous example, a single token representing an integer variable was embedded in the format string. The function expects a variable corresponding to this token to be passed to the printf function as the second argument. On the Intel architecture (at least), arguments to functions are pushed onto the stack before the stack frame is created. When the function references its arguments on these platforms, it references data on the stack beneath the stack frame.

NOTE

In this chapter, we use the term beneath to describe data that was placed on the stack before the data we are suggesting is above. On the Intel architecture, the stack grows down. On this and other architectures with stacks that grow down, the address of the top of the stack decreases numerically as the stack grows. On these systems, data that is described as beneath the other data on the stack has a numerically higher address than data above it.

The fact that numerically higher memory addresses may be lower in the stack can cause confusion. Be aware that a location in the stack described as above another means that it is closer to the top of the stack than the other location.

In our example, an argument was passed to the printf function corresponding to the %i token—the integer variable. The Base10 character representation of the value of this variable (10) was output where the token was placed in the format string.

When creating the string that is to be output, the printf function will retrieve whatever value of integer data type size is at the right location in the stack and use that as the variable corresponding to the token in the format string. The printf function will then convert the binary value to a character representation based on the format specifier and include it as part of the formatted output string. As will be demonstrated, this occurs regardless of whether the programmer has actually passed a second argument to the printf function or not. If no parameters corresponding to the format string tokens were passed, data belonging to the calling function(s) will be treated as the arguments, because that is what is next on the stack.

Let’s go back to our example, pretending that we had later decided to print only a static string but forgot to remove the format specifier. The call to printf now looks like this:

printf(“this is the skeleton of the string, %i”);

/* note: no argument. only a format string. */

When this function executes, it does not know that there has not been a variable passed corresponding to the %i token. When creating the string, the function will read an integer from the area of the stack where a variable would be had it been passed by the programmer, the 4 bytes beneath the stack frame. Provided that the virtual memory where the argument should be can be dereferenced, the program will not crash and whatever bytes happened to be at that location will be interpreted as, and output as, an integer.

The following program output demonstrates this:

[dma@victim]$ ./format_example

this is the skeleton of the string, -1073742936

Recall that no variable was passed as an integer argument corresponding to the %i format specifier; however, an integer was included in the output string. The function simply reads bytes that make up an integer from the stack as though they were passed to the function by the programmer. In this example, the bytes in memory happened to represent the number –1073742952 as a signed int data type in Base10.

If users can force their own data to be part of the format string, they cause the affected printf function to treat whatever happens to be on the stack as legitimate variables associated with format specifiers that they supply.

As we will see, the ability for an external source to control the internal function of a printf function can lead to some serious potential security vulnerabilities. If a program exists that contains such a bug and returns the formatted string to the user (after accepting format string input), attackers can read possibly sensitive memory contents. Memory can also be written to through malicious format strings by using the obscure format specifier %n. The purpose of the %n token is to allow programmers to obtain the number of characters output at predetermined points during string formatting. How attackers can exploit format string vulnerabilities will be explained in detail as we work toward developing a functional format string exploit.

Why and Where Do Format String Vulnerabilities Exist?

Format string vulnerabilities are the result of programmers allowing externally supplied, unsanitized data in the format string argument. These are some of the most commonly seen programming mistakes resulting in exploitable format string vulnerabilities.

The first is where a printf function is called with no separate format string argument, simply a single string argument. For example:

printf(argv[1]);

In this example, the second argument value (often the first command line argument) is passed to printf() as the format string. If format specifiers have been included in the argument, they will be acted upon by the printf function:

[dma@victim]$ ./format_example %i

−1073742936

This mistake is usually made by newer programmers, and is due to unfamiliarity with the C library string processing functions. Sometimes this mistake is due to the programmer’s laziness, neglecting to include a format string argument for the string (i.e., %s). This reason is often the underlying cause of many different types of security vulnerabilities in software.

The use of wrappers for printf() style functions, often for logging and error reporting functions, is very common. When developing, programmers may forget that an error message function calls printf() (or another printf function) at some point with the variable arguments it has been passed. They may simply become accustomed to calling it as though it prints a single string:

error_warn(errmsg);

The vulnerability that we are going to exploit in this chapter is due to an error similar to this.

One of the most common causes of format string vulnerabilities is improper calling of the syslog() function on UNIX systems. syslog() is the programming interface for the system log daemon. Programmers can use syslog() to write error messages of various priorities to the system log files. As its string arguments, syslog() accepts a format string and a variable number of arguments corresponding to the format specifiers. (The first argument to syslog() is the syslog priority level.) Many programmers who use syslog() forget or are unaware that a format string separate from externally supplied log data must be passed. Many format string vulnerabilities are due to code that resembles this:

syslog(LOG_AUTH, errmsg);

If errmsg contains externally supplied data (such as the username of a failed login attempt), this condition can likely be exploited as a typical format string vulnerability.

How Can They Be Fixed?

Like most security vulnerabilities due to insecure programming, the best solution to format string vulnerabilities is prevention. Programmers need to be aware that these bugs are serious and can be exploited by attackers. Unfortunately, a global awakening to security issues is not likely any time soon.

For administrators and users concerned about the software they run on their system, a good policy should keep the system reasonably secure. Ensure that all setuid binaries that are not needed have their permissions removed, and all unnecessary services are blocked or disabled.

Mike Frantzen published a workaround that could be used by administrators and programmers to prevent any possible format string vulnerabilities from being exploitable. His solution involves attempting to count the number of arguments passed to a printf() function compared to % tokens in the format string. This workaround is implemented as FormatGuard in Immunix, a distribution of Linux designed to be secure at the application level.

Mike Frantzen’s Bugtraq post is archived at www.securityfocus.com/archive/1/72118. FormatGuard can be found at www.immunix.org/formatguard.html.

How Format String Vulnerabilities Are Exploited

There are three basic goals an attacker can accomplish by exploiting format string vulnerabilities. First, the attacker can cause a process to fail due to an invalid memory access. This can result in a denial of service. Second, attackers can read process memory if the formatted string is output. Finally, memory can be overwritten by attackers—possibly leading to execution of instructions.

Damage & Defense …

Using Format Strings to Exploit Buffer Overflows

User-supplied format specifiers can also be used to aid in exploiting buffer overflow conditions. In some situations, an sprintf() condition exists that would be exploitable if it were not for length limitations placed on the source strings prior to them being passed to the insecure function. Due to these restrictions, it may not be possible for an attacker to supply an oversized string as the format string or the value for a %s in an sprintf call.

If user-supplied data can be embedded in the format string argument of sprintf() the size of the string being created can be inflated by using padded format specifiers. For example, if the attacker can have %100i included in the format string argument for sprintf, the output string may end up more than 100 bytes larger than it should be. The padded format specifier may create a large enough string to overflow the destination buffer. This may render the limits placed on the data by the programmer useless in protecting against overflows and allow for the exploitation of this condition by an attacker to execute arbitrary code.

We will not discuss this method of exploitation further. Although it involves using format specifiers to overwrite memory, the format specifier simply is being used to enlarge the string so that a typical stack overflow condition can occur. This chapter is for exploitation using only format specifiers, without relying on another vulnerability due to a separate programmatic flaw such as buffer overflows. Additionally, the described situation could also be exploited as a regular format string vulnerability using only format specifiers to write to memory.

Denial of Service

The simplest way that a format string vulnerability can be exploited is to cause a denial of service through forcing the process to crash. It is relatively easy to cause a program to crash with malicious format specifiers.

Certain format specifiers require valid memory addresses as corresponding variables. One of them is %n, which we just discussed and which we will explain in further detail soon. Another is %s, which requires a pointer to a NULL terminated string. If an attacker supplies a malicious format string containing either of these format specifiers, and no valid memory address exists where the corresponding variable should be, the process will fail attempting to dereference whatever is in the stack. This may cause a denial of service and does not require any complicated exploit method.

In fact, there were a handful of known problems caused by format strings that existed before anyone understood that format strings were exploitable. For example, it was know that it was possible to crash the BitchX IRC client by passing %s%s%s%s as one of the arguments for certain IRC commands. However, as far as we know, no one realized this was further exploitable until the WU-FTPD exploit came to light.

There is not much more to crashing processes using format string. There are much more interesting and useful things an attacker can do with format string vulnerabilities.

Reading Memory

If the output of the format string function is available, attackers can also exploit these vulnerabilities to read process memory. This is a serious problem and can lead to disclosure of sensitive information. For example, if a program accepts authentication information from clients and does not clear it immediately after use, format string vulnerabilities can be used to read it. The easiest way for an attacker to read memory due to a format string vulnerability is to have the function output memory as variables corresponding to format specifiers. These variables are read from the stack based on the format specifiers included in the format string. For example, 4 byte values can be retrieved for each instance of %x. The limitation of reading memory this way is that it is limited to only data on the stack.

It is also possible for attackers to read from arbitrary locations in memory by using the %s format specifier. As described earlier, the %s format specifier corresponds to a NULL terminated string of characters. This string is passed by reference. An attacker can read memory in any location by supplying a %s format specifier and a corresponding address variable to the vulnerable program. The address where the attacker would like reading to begin must also be placed in the stack in the same manner that the address corresponding to any %n variables would be embedded. The presence of a %s format specifier would cause the format string function to read in bytes starting at the address supplied by the attacker until a NULL byte is encountered.

The ability to read memory is very useful to attackers and can be used in conjunction with other methods of exploitation. How to do this will be described in detail and will be used in the exploit we are developing toward the end of this chapter.

Writing to Memory

Previously, we touched on the %n format specifier. This formerly obscure token exists for the purpose of indicating how large a formatted string is at runtime. The variable corresponding to %n is an address. When the %n token is encountered during printf processing, the number (as an integer data type) of characters that make up the formatted output string is written to the address argument corresponding to the format specifier.

The existence of such a format specifier has serious security implications: it can allow for writes to memory. This is the key to exploiting format string vulnerabilities to accomplish goals such as executing shellcode.

Single Write Method

The first method that we will talk about involves using only the value of a single %n write to elevate privileges.

In some programs, critical values such as a user’s userid or groupid is stored in process memory for purposes of lowering privileges. Format string vulnerabilities can be exploited by attackers to corrupt these variables.

An example of a program with such a vulnerability is the Screen utility. Screen is a popular UNIX utility that allows for multiple processes to use a single pseudoterminal. When installed setuid root, Screen stores the privileges of the invoking user in a variable. When a window is created, the Screen parent process lowers privileges to the value stored in that variable for the children processes (the user shell, etc.).

Versions of Screen prior to and including 3.9.5 contained a format string vulnerability when outputting the user-definable visual bell string. This string, defined in the user’s .screenrc configuration file, is output to the user’s terminal as the interpretation of the ASCII beep character. When output, user-supplied data from the configuration file is passed to a printf function as part of the format string argument.

Due to the design of Screen, this particular format string vulnerability could be exploited with a single %n write. No shellcode or construction of addresses was required. The idea behind exploiting Screen is to overwrite the saved userid with one of the attacker’s choice, such as 0 (root’s userid).

To exploit this vulnerability, an attacker had to place the address of the saved userid in memory reachable as an argument by the affected printf function. The attacker must then create a string that places a %n at the location where a corresponding address has been placed in the stack. The attacker can offset the target address by 2 bytes and use the most significant bits of the %n value to zero-out the userid. The next time a new window is created by the attacker, the Screen parent process would set the privileges of the child to the value that has replaced the saved userid.

By exploiting the format string vulnerability in Screen, it was possible for local attackers to elevate to root privileges. The vulnerability in Screen is a good example of how some programs can be exploited by format string vulnerabilities trivially. The method described is largely platform independent as well.

Multiple Writes Method

Now we move on to using multiple writes to locations in memory. This is slightly more complicated but has more interesting results. Through format string vulnerabilities it is often possible to replace almost any value in memory with whatever the attacker likes. To explain this method, it is important to understand the %n parameter and what gets written to memory when it is encountered in a format string.

To recap, the purpose of the %n format specifier is to print the number of characters to be output so far in the formatted string. An attacker can force this value to be large, but often not large enough to be a valid memory address (for example, a pointer to shellcode). Because of this reason, it is not possible to replace such a value with a single %n write. To get around this, attackers can use successive writes to construct the desired word byte by byte. By using this technique, a hacker can overwrite almost any value with arbitrary bytes. This is how arbitrary code is executed.

How Format String Exploits Work

Let’s now investigate how format string vulnerabilities can be exploited to overwrite values such as memory addresses with whatever the attacker likes. It is through this method that hackers can force vulnerable programs to execute shellcode.

Recall that when the %n parameter is processed, an integer is written to a location in memory. The address of the value to be overwritten must be in the stack where the printf function expects a variable corresponding to a %n format specifier to be. An attacker must somehow get an address into the stack and then write to it by placing %n at the right location in their malicious format string. Sometimes this is possible through various local variables or other program-specific conditions where user-controllable data ends up in the stack.

There is usually an easier and more consistently available way for an attacker to specify their target address. In most vulnerable programs, the user-supplied format string passed to a printf function exists in a local variable on the stack itself. Provided that that there is not too much data as local variables, the format string is usually not too far away from the stack frame belonging to the affected printf function call. Attackers can force the function to use an address of their choosing if they include it in their format string and place an %n token at the right location.

Attackers have the ability to control where the printf function reads the address variable corresponding to %n. By using other format specifiers, such as %x or %p, the stack can be traversed or “eaten”‘by the printf function until it reaches the address embedded in the stack by the attacker. Provided that user data making up the format string variable isn’t truncated, attackers can cause printf to read in as much of the stack as is required, until printf() reads as variables addresses they have placed in the stack. At those points they can place %n specifiers that will cause data to be written to the supplied addresses.

NOTE

There cannot be any NULL bytes in the address if it is in the format string (except as the terminating byte), as the string is a NULL terminated array just like any other in C. This does not mean that addresses containing NULL bytes can never be used—addresses can often be placed in the stack in places other than the format string itself. In these cases it may be possible for attackers to write to addresses containing NULL bytes.

For example, an attacker who wishes to use an address stored 32 bytes away from where a printf() function reads its first variable can use 8 %x format specifiers. The %x token outputs the value, in Base16 character representation, of a 4-byte word on 32-bit Intel systems. For each instance of %x in the format string, the printf function reads 4 bytes deeper into the stack for the corresponding variable. Attackers can use other format specifiers to push printf() into reading their data as variables corresponding to the %n specifier.

Once an address is read by printf() as the variable corresponding to a %n token, the number of characters output in the formatted string at that point will be stored there as an integer. This value will overwrite whatever exists at the address (assuming it is a valid address and writeable memory).

Constructing Values

An attacker can manipulate the value of the integer that is written to the target address. Hackers can use the padding functionality of printf to expand the number of characters to be output in the formatted string.

image

In the preceding example, the %10i token in the format string is an integer format specifier containing a padding value. The padding value tells the printf() function to use 10 characters when representing the integer in the formatted string.

[dma@victim server]$ ./test

start: 10 end

The decimal representation of the number 10 does not require 10 characters, so by default the extra ones are spaces. This feature of printf() can be used by attackers to inflate the value written as %n without having to create an excessively long format string. Although it is possible to write larger numbers, the values attackers wish to write are often much larger than can be created using padded format specifiers.

By using multiple writes through multiple %n tokens, attackers can use the least significant bytes of the integer values being written to write each byte comprising the target value separately. This will allow for the construction of a word such as an address using the relatively low numerical values of %n. To accomplish this, attackers must specify addresses for each write successive to the first offset from the target by one byte.

By using four %n writes and supplying four addresses, the low-order bits of the integers being written are used to write each byte value in the target word (see Figure 9.1).

image

Figure 9.1 Address Being Constructed Using Four Writes

On some platforms (such as RISC systems), writes to memory addresses not aligned on a 2-byte boundary are not permitted. This problem can be solved in many cases by using short integer writes using the %hn format specifier.

Constructing custom values using successive writes is the most serious method of exploitation, as it allows for attackers to gain complete control over the process. This can be accomplished by overwriting pointers to instructions with pointers to attacker-supplied shellcode. If an attacker exploits a vulnerability this way, the flow of program execution can be modified such that the shellcode is executed by the process.

What to Overwrite

With the ability to construct any value at almost any location in memory, the question is now “what should be overwritten?” Given that nearly any address can be used, the hacker has many options. The attacker can overwrite function return addresses, which is the same thing done when stack-based buffer overflows are exploited. By overwriting the current function return address, shellcode can be executed when the function returns. Unlike overflows, attackers are not limited to return addresses, though.

Overwriting Return Addresses

Most stack-based buffer overflow vulnerabilities involve the attacker replacing the function return address with a pointer to other instructions. When the function that has been corrupted finishes and attempts to return to the calling block of code, it instead jumps to wherever the replacement return address points. The reason that attackers exploiting stack overflows overwrite return addresses is because that is usually all that can be overwritten. The attacker does not get a choice of where their data ends up, as it is usually copied over data neighboring the affected buffer. Format string vulnerabilities differ in that the write occurs at the location specified by the address corresponding to the %n specifier. An attacker exploiting a format string vulnerability can overwrite a function return address by explicitly addressing one of the target addresses. When the function returns, it will return to the address constructed by the attacker’s %n writes.

There are two possible problems that attackers face when overwriting function return addresses. The first is situations where a function simply does not return. This is common in format string vulnerabilities because many of them involve printing error output. The program may simply output an error message (with the externally supplied data passed as the format string argument) and call exit() to terminate the program. In these conditions, overwriting a return address for anything other than the printf function itself will not work. The second problem is that overwriting return addresses can be caught by anti-buffer-overflow mechanisms such as StackGuard.

Overwriting Global Offset Table Entries and Other Function Pointers

The global offset table (GOT) is the section of an ELF program that contains pointers to library functions used by the program. Attackers can overwrite GOT entries with pointers to shellcode that will execute when the library functions are called.

Not all binaries being exploited are of the ELF format. This leaves general function pointers, which are easy targets for programs that use them. Function pointers are variables that the programmer creates and must be present in the program for an attacker to exploit them. In addition to this, the function must be called by reference using the function pointer for the attacker’s shellcode to execute.

Examining a Vulnerable Program

We’ll now decide on a program to use to demonstrate the exploitation of a format string vulnerability. The vulnerability should be remotely exploitable. Penetration of computer systems by attackers from across the Internet without any sort of credentials beforehand best demonstrates the seriousness of format string vulnerabilities. The vulnerability should be real in a program with a well-known or respected author, to demonstrate that vulnerabilities can and do exist in software we may trust to be well written. Our example should also have several properties that allow us to explore the different aspects of exploiting format string vulnerabilities, such as outputting the formatted string.

The program we will use as our example is called Rwhoisd. Rwhoisd, or the RWHOIS daemon, is an implementation of the RWHOIS service. The research and development branch of Network Solutions, Inc currently maintains the rwhoisd RWHOIS server and it is published under the GNU Public License.

A classic remotely exploitable format string vulnerability exists in versions 1.5.7.1 of rwhoisd and earlier. The format string vulnerability allows for unauthenticated clients who can connect to the service to execute arbitrary code. The vulnerability was first made public through a post to the Bugtraq mailing list (the message is archived at www.securityfocus.com/archive/1/222756).

To understand the format string vulnerability that was present in rwhoisd, we must look at its source code. The version we are examining is version 1.5.7.1. At the time of writing, it is available for download at the Web site www.rwhois.net/ftp.

Notes from the Underground …

Some High Profile Format String Vulnerabilities

Besides the WU-FTPD SITE EXEC format string vulnerability, there have been several others worth mentioning. Some of these have been used in worms and mass-hacking utilities and have directly resulted in thousands of hosts being compromised.

IRIX Telnetd Client-supplied data included in the format string argument for syslog() allowed for remote attackers to execute arbitrary code without authenticating. This vulnerability was discovered by the Last Stage of Delirium. (See www.securityfocus.com/bid/1572.)

Linux rpc.statd This format string vulnerability was due to the misuse of syslog() as well and could also be exploited to gain root privileges remotely. It was discovered by Daniel Jacobowitz and published on July 16, 2000 in a post to Bugtraq. (See www.securityfocus.com/bid/1480.)

Cfingerd Another format string vulnerability due to syslog() discovered by Megyer Laszlo. Successful exploitation can result in remote attackers gaining control of the underlying host. (See www.securityfocus.com/bid/2576.)

Multiple Vendor LibC Locale Implementation Jouko Pynnünen and Core SDI independently discovered a format string vulnerability in the C library implementations shipped with several UNIX systems. The vulnerability allowed for attackers to gain elevated privileges locally by exploiting setuid programs. (See www.securityfocus.com/bid/1634.)

Multiple CDE Vendor rpc.ttdbserverd ISS X-Force discovered a vulnerability related to the misuse of syslog() in versions of the ToolTalk database server daemon shipped with several operating systems that include CDE. This vulnerability allows for remote, unauthenticated attackers to execute arbitrary code on the victim host. (See www.securityfocus.com/bid/3382.)

The vulnerability is present when an error message in response to an invalid argument to the –soa command is to be output.

Error messages are created and output using a standard function called print_error(). This function is called throughout the server source code to handle reporting of error conditions to the client or user. It accepts an integer argument to specify the error type as well as a format string and a variable number of arguments.

The source code to this function is in the common/client_msgs.c source file (path is relative to the directory created when the 1.5.7.1 source tarball is unarchived).

image

image

The bolded line is where the arguments passed to this function are passed to vprintf(). The format string vulnerability is not in this particular function, but in the use of it. Print_error() relies on the calling function to pass it a valid format string and any associated variables.

This function is a listed here because it is a good example of the kind of situation that leads to exploitable format string vulnerabilities. Many programs have functions very similar to print_error(). It is a wrapper for printing error messages in the style of syslog(), with an error code and printf() style variable arguments. The problem though, as discussed in the beginning of the chapter, is that programmers may forget that a format string argument must be passed.

We will now look at what happens when a client connects to the service and attempts to pass format string data to the vprintf() function through the print_error() wrapper.

To those of you who have downloaded the source code, the offending section of code is in the server/soa.c source file. The function in which the offending code exists is called soa_parse_args(). The surrounding code has been stripped for brevity. The vulnerable call exists on line 53 (it is in bold in this listing):

image

In this instance of print_error(), the variable argv[i] is passed as the format string argument to print_error(). The string will eventually be passed to the vprintf() function (as previously pointed out). To a source code auditor, this looks suspiciously exploitable. The proper way to call this function would be:

print_error(INVALID_AUTH_AREA, “%s”, argv[i]);

In this example, argv[i] is passed to the print_error() function as a variable corresponding to the %s (string) token in the format string. The way that this function is called eliminates the possibility of any maliciously placed format specifiers in argv[i] from being interpreted/acted upon by the vprintf() called by print_error(). The string argv[i] is the argument to the -soa directive passed to the server by the client.

To summarize, when a client connects to the rwhoisd server and issues a -soa command, an error message is output via print_error() if the arguments are invalid. The path of execution leading up to this looks like this:

1. Server receives -soa argument, and calls soa_directive() to handle the command.

2. soa_directive() passes the client command to soa_parse_args(), which interprets the arguments to the directive.

3. soa_parse_args() detects an error and passes an error code and the command string to the print_error() function as the format string argument.

4. print_error() passes the format string containing data from the client to the vprintf() function (highlighted in the previous section).

It is clear now that remote clients can have data passed to vprintf() as the format string variable. This data is the argument to the -soa directive. By connecting to the service and supplying a malicious format string, attackers can write to memory belonging to the server process.

Testing with a Random Format String

Having located a possible format string vulnerability in the source code, we can now attempt to demonstrate that it is exploitable through supplying malicious input and observing the server reaction.

Programs with suspected format string vulnerabilities can be forced to exhibit some form of behavior that indicates their presence. If the vulnerable program outputs the formatted string, their existence is obvious. If the vulnerable program does not output the formatted string, the behavior of the program in response to certain format specifiers can suggest the presence of a format string vulnerability.

If the process crashes when %n%n is input, it’s likely that a memory access violation occurred when attempting to write to invalid addresses read from the stack. It is possible to identify vulnerable programs by supplying these format specifiers to a program that does not output the formatted string. If the process crashes, or if the program does not return any output at all and appears to terminate, it is likely that there is a format string vulnerability.

Back to our example, the formatted string is returned to the client as part of the server error response. This makes the job of an attacker looking for a way into the host simple. The following example demonstrates the output of rwhoisd that is indicative of a format string bug:

image

In this example, connecting to the service and transmitting a format specifier in the data suspected to be included as a format string variable caused –1073743563 to be included in the server output where the literal %i should be. The negative number output is the interpretation of the 4 bytes on the stack where the printf function was expecting a variable as a signed integer. This is confirmation that there is a format string vulnerability in rwhoisd.

Having identified a format string vulnerability both in the program source code and through program behavior, we should set about exploiting it. This particular vulnerability is exploitable by a remote client from across a network. It does not require any authentication and it is likely that it can be exploited by attackers to gain access to the underlying host.

In cases such as this, where a program outputs a formatted string, it is possible to read the contents of the stack to aid in successful exploitation. Complete words of memory can be retrieved in the following manner:

image

image

In this example, the client retrieved one, two, three, and four words from the stack. They have been formatted in a way that can be parsed automatically by an exploit. A well-written exploit can use this output to reconstruct the stack layout in the server process. The exploit can read memory from the stack until the format string itself is located, and then calculate automatically the location where the %n writes should begin in the format string.

image

In this example, the client has caused the printf function to search the stack for variables where the format string is stored. The 010%p characters (in bold) are the beginning of the client-supplied string, containing the very format specifiers being processed. If the attacker were to embed an address in their format string at the beginning of their string, and use a %n token where the %c specifiers are, the address in the format string would be the one written to.

Tools & Traps …

More Stack with Less Format String

It may be the case that the format string in the stack cannot be reached by the printf function when it is reading in variables. This may occur for several reasons, one of which is truncation of the format string. If the format string is truncated to a maximum length at some point in the program’s execution before it is sent to the printf function, the number of format specifiers that can be used is limited. There are a few ways to get past this obstacle when writing an exploit.

The idea behind getting past this hurdle and reaching the embedded address is to have the printf function read more memory with less format string. There are a number of ways to accomplish this:

image Using Larger Data Types The first and most obvious method is to use format specifiers associated with larger datatypes, one of which is %lli, corresponding to the long long integer type. On 32-bit Intel architecture, a printf function will read 8 bytes from the stack for every instance of this format specifier embedded in a format string. It is also possible to use long float and double long float format specifiers, though the stack data may cause floating point operations to fail, resulting in the process crashing.

image Using Output Length Arguments Some versions of libc support the * token in format specifiers. This token tells the printf function to obtain the number of characters that will be output for this specifier from the stack as a function argument. For each *, the function will eat another 4 bytes. The output value read from the stack can be overridden by including a number next to the actual format specifier. For example:

    The format specifier %*******10i will result in an integer represented using 10 characters. Despite this, the printf function will eat 32 bytes when it encounters this format specifier.

    The first use of this method is credited to an individual known as lorian.

image Accessing Arguments Directly It is also possible to have the printf function reference specific parameters directly. This can be accomplished by using format specifiers in the form %$xn, where x is the number of the argument (in order). This technique is possible only on platforms with C libraries that support access of arguments directly.

Having exhausted these tricks and still not able to reach an address in the format string, the attacker should examine the process to determine if there is anywhere else in a reachable region of the stack where addresses can be placed. Remember that it is not required that the address be embedded in the format string, just that it is convenient since it is often near in the stack. Data supplied by the attacker as input other than the format string may be reachable. In the Screen vulnerability, it was possible to access a variable that was constructed using the HOME environment variable. This string was closer in the stack to anything else externally supplied and could barely be reached.

Writing a Format String Exploit

Now we move on to actually exploiting a format string vulnerability. The goal of the attacker, in the case of a program such as rwhoisd, is to force it to execute instructions that are attacker-supplied. These instructions should grant access to the attacker on the underlying host.

The exploit will be written for rwhoisd version 1.5.7.1, compiled on an i386 Linux system. This is the program we looked at earlier. As previously mentioned, to execute shellcode, the exploit must overwrite a value that is referenced by the process at some point as the address of instructions to be executed. In the exploit we are developing, we will be overwriting a function return address with a pointer to shellcode. The shellcode will exec() /bin/sh and provide shell access to the client.

The first thing that the exploit code must do is connect to the service and attempt to locate the format string in the stack. The exploit code does this by connecting to the service and supplying format strings that incrementally return words from the stack to the exploit. The function in the exploit that does this is called brute_force(). This function sends format string specifiers that cause increasing amounts of stack memory to be output by the server. The exploit then compares each word in the stack output to 0x6262626262, which was placed at the beginning of the format string. There is a chance that the alignment may be off; this exploit does not take that possibility into account.

image

image

The stack output is parsed easily by the exploit due to the use of the %010p format specifier by the exploit. The %010p formats each word as an 8-character hex representation preceded by 0x. Each of these string representations of words can be passed to a C library function such as strtoul and returned as a binary (unsigned with strtoul()) integer data type.

The goal of this exploit is to execute arbitrary code. To do this, we must overwrite some value that will be used to reference instructions to be executed. One such value that can be overwritten is a function return address. As discussed earlier, stack based buffer overflows usually overwrite these values because the return address happens to exist on the stack and gets overwritten in an overflow condition. We will replace a function return address simply because it’s convenient.

Our goal is to overwrite the return address stored when print_error() is called. In the binary version used to write this proof of concept, the address of this return address on the stack when we can overwrite it is 0xbffff8c8. This address will serve as our target.

Once the exploit has located the format string in the stack, it must construct a new format string with the %n specifiers at the right position for the supplied addresses to be used when writing. This can be accomplished by using format specifiers such as %x to eat as many words of the stack as are required. This exploit does this automatically based on the results of the brute_force() function.

image

The num variable in the code listed originates from the brute force location of the format string. Now that the exploit has an address to write to, we must construct an address at the target location.

The return address must be overwritten using the successive writes we discussed earlier. In order to construct a 4-byte address, the four writes must occur at different offsets from the start of the word. The addresses must also be placed in the format string:

image

The next step is to write the correct value at each of the offsets. The value we are writing is the location of shellcode that we have placed in the stack. The address for this example proof of concept is 0xbffff99d.

To construct this value, we must write the following low-order bytes to each address in our format string:

image

This can be accomplished by using the padded format specifiers we discussed earlier to write the desired low-order bits.

For example, writing %125x might cause the value 0x0000019d to be written to TARGET. That’s perfect for our situation because 9d will be the value of the byte we want to write. By using padded format specifiers and successive writes, we can construct the address we want at the target location:

image

It should be noted that the padding value used is highly dependent on the total number of characters being output in the formatted string. It is possible to determine how many characters to pad automatically if the formatted string is output.

Once the function return address is overwritten, vfprintf() will return normally and the shellcode will be executed once print_error() returns. Figure 9.2 demonstrates successful exploitation of this vulnerability.

image

Figure 9.2 Exploitation of the rwhoisd Format String Vulnerability to Penetrate a Host

The exploit code follows:

image

image

image

image

image

image

image

image

image

Summary

Format string vulnerabilities are one of the newest additions to the typical hacker’s bag of tricks.

Techniques hackers are using to exploit bugs in software have become significantly more sophisticated in the past couple of years. One of the reasons for this is that there are simply more hackers, more eyes pouring over and scrutinizing source code. It’s much easier to obtain information about how vulnerabilities and weaknesses can be exploited and how systems function.

In general, hackers have woken up to the different consequences that programmatic flaws can have. Printf functions, and bugs due to misuse of them, have been around for years—but it was never even conceived by anyone that they could be exploited to force execution of shellcode until recently. In addition to format string bugs, new techniques have emerged such as overwriting malloc structs; relying on free() to overwrite pointers, and signed integer index errors.

Hackers are more aware of what to look for, and how subtle bugs in software can be exploited. Hackers are now peering into every program, observing behavior in response to every possible kind of input. It is now more important than ever for programmers to be conscious that many kinds of bugs thought to be harmless can have disastrous consequences if left unfixed. System administrators and users should be aware that exploitable bugs never considered critical may lie latent in software they use.

Solutions Fast Track

Understanding Format String Vulnerabilities

image Format string vulnerabilities are due to programmers allowing externally supplied data in printf() function format string variable.

image Format string vulnerabilities can allow for an attacker to read and write to memory.

image Format string vulnerabilities can lead to the execution of arbitrary code through overwriting of return addresses, GOT entries, function pointers, and so on.

Examining a Vulnerable Program

image Vulnerable programs typically have printf() calls with variables passed as the format string argument.

image Wrappers for printf() functions often lead to programmers forgetting that a function accepts format strings and variable arguments.

image Misuse of the syslog() function is responsible for a large number of format string vulnerabilities, many of them high-profile.

Testing with a Random Format String

image Programs can be tested for format string vulnerabilities by observing behavior when format specifiers are supplied in various input.

image Supplying %s, %x, %p, and other format specifiers can be used to determine a format string vulnerability if data from memory is output in place of them. You can’t always tell immediately that there is a format string vulnerability if the results are not being output.

image Observing a process crash due to %n or %s format specifiers supplied as input indicates that there is a format string vulnerability.

Writing a Format String Exploit

image Format string exploits can be written that read memory or write specific values to memory. Format string vulnerabilities are not necessarily platform dependent. It is possible to exploit programs such as Screen without relying on architecture and OS-dependent shellcode.

image In format string vulnerabilities where the formatted string is output to the attacker, memory can be read to aid in exploitation. Exploits can reconstruct the process stack and automatically determine where to place %n specifiers.

image Format string vulnerabilities can use successive writes to overwrite targets in memory with arbitrary values. This technique can be used to write a custom value to almost any location in memory.

image On platforms where unaligned writes are not permitted (such as RISC), the %hn format specifier can be used to write short values on 2-byte boundaries.

Frequently Asked Questions

The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the “Ask the Author” form.

Q: Can nonexecutable stack configurations or stack protection schemes such as StackGuard protect against format string exploits?

A: Unfortunately, no. Format string vulnerabilities allow for an attacker to write to almost any location in memory. StackGuard protects the integrity of stack frames, while nonexecutable stack configurations do not allow instructions in the stack to be executed. Format string vulnerabilities allow for both of these protections to be evaded. Hackers can replace values used to reference instructions other than function return addresses to avoid StackGuard, and can place shellcode in areas such as the heap. Although protections such as nonexecutable stack configurations and StackGuard may stop some publicly available exploits, determined and skilled hackers can usually get around them.

Q: Are format string vulnerabilities UNIX specific?

A: No. Format string vulnerabilities are common in UNIX systems because of the more frequent use of the printf functions. Misuse of the syslog interface also contributes to many of the UNIX specific format string vulnerabilities. The exploitability of these bugs (involving writing to memory) depends on whether the C library implementation of printf supports %n. If it does, any program linked to it with a format string bug can theoretically be exploited to execute arbitrary code.

Q: How can I find format string vulnerabilities?

A: Many format string vulnerabilities can easily be picked out in source code. In addition, they can often be detected automatically by examining the arguments passed to prinf() functions. Any prinf() family call that has only a single argument is an obvious candidate, if the data being passed is externally supplied.

Q: How can I eliminate or minimize the risk of unknown format string vulnerabilities in programs on my system?

A: A good start is having a sane security policy. Rely on the least-privileges model, ensure that only the most necessary utilities are installed setuid and can be run only by members of a trusted group. Disable or block access to all services that are not completely necessary.

Q: What are some signs that someone may be trying to exploit a format string vulnerability?

A: This question is relevant because many format string vulnerabilities are due to bad use of the syslog() function. When a format string vulnerability due to syslog() is exploited, the formatted string is output to the log stream. An administrator monitoring the syslog logs can identify format string exploitation attempts by the presence of strange looking syslog messages. Some other more general signs are if daemons disappear or crash regularly due to access violations.

Q: Where can I learn more about finding and exploiting format string vulnerabilities?

A: There are a number of excellent papers on the subject. Tim Newsham authored a whitepaper published by Guardent which can be found at www.securityfocus.com/archive/1/81565. Papers written by TESO (www.team-teso.net/articles/formatstring) and HERT

(www.hert.org/papers/format.html) are also recommended.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.105.124