In this chapter, I focus on application-level vulnerabilities and mitigation strategies. The effectiveness of firewalls and network segmentation mechanisms is severely impacted if vulnerabilities exist within accessible network services. In recent years, major security flaws in Unix and Windows systems have been exposed, ressulting in large numbers of Internet-based hosts being compromised by hackers and worms alike.
Hacking is the art of manipulating a process in such a way that it performs an action that is useful to you.
A simple example can be found in a search engine; the program takes a query, cross-references it with a database, and provides a list of results. Processing occurs on the web server itself, and by understanding the way search engines are developed and their pitfalls (such as accepting both the query string and database filename values), a hacker can attempt to manipulate the search engine to process and return sensitive files.
Many years ago, the main U.S. Pentagon, Air Force, and Navy web
servers (http://www.defenselink.mil, http://www.af.mil, and http://www.navy.mil) were vulnerable to this very type of
search engine attack. They used a common search engine called multigate, which accepted two abusable
arguments: SurfQueryString
and
f
. The Unix password file could be
accessed by issuing a crafted URL, as shown in Figure 14-1.
High-profile military web sites are properly protected at the network level by firewalls and other security appliances. However, by the very nature of the massive amount of information stored, a search engine was implemented, which in turn introduced vulnerabilities at the application level.
Nowadays, most vulnerabilities are more complex than simple logic flaws. Stack, heap, and static overflows, along with format string bugs, allow remote attackers to manipulate nested functions and often execute arbitrary code on accessible hosts.
In a nutshell, software is vulnerable due to complexity and inevitable human error. Many vendors (e.g., Microsoft, Sun, Oracle, and others) who developed and built their software in the 1990s didn’t write code that was secure from heap overflows or format string bugs because these issues were not widely known at the time.
Software vendors are now in a situation where, even though it would be the just thing to do, it is simply too expensive to secure their operating systems and server software packages from memory manipulation attacks. Code review and full black-box testing of complex operating system and server software would take years to undertake and would severely impact future development and marketing plans, along with revenue.
In order to develop adequately secure programs, the interaction of that program with the environment in which it is run should be controlled at all levels—no data passed to the program should be trusted or assumed to be correct. Input validation is a term used within application development to ensure that data passed to a function is properly sanitized before it is stored in memory. Proper validation of all external data passed to key network services would go a long way toward improving the security and resilience of IP networks and computer systems.
In this section, I concentrate on Internet-based network service vulnerabilities, particularly how software running at both the kernel and system daemon levels processes data. These vulnerabilities can be categorized into two high-level groups: memory manipulation weaknesses and simple logic flaws.
This section details memory manipulation attacks to help you understand the classification of bugs and the respective approaches you can take to mitigate risks. It also identifies simple logic flaws (also discussed in Chapter 7), which are a much simpler threat to deal with.
Memory manipulation attacks involve sending malformed data to the target network service in such a way that the logical program flow is affected (the idea is to execute arbitrary code on the host, although crashes sometimes occur, resulting in denial of service).
Here are the three high-level categories of remotely exploitable memory manipulation attacks:
Classic buffer overflows (stack, heap, and static overflows)
Integer overflows (technically an overflow delivery mechanism)
Format string bugs
I discuss these three attack groups and describe individual attacks within each group (such as stack saved instruction and frame pointer overwrites). There are a small number of exotic bug types (e.g., index array manipulation and static overflows) that unfortunately lie outside the scope of this book, but which are covered in niche application security publications and online presentations.
By understanding how exploits work, you can effectively implement changes to your critical systems to protect against future vulnerabilities. To appreciate these low-level issues, you must first have an understanding of runtime memory organization and logical program flow.
Memory manipulation attacks involve overwriting values within memory (such as instruction pointers) to change the logical program flow and execute arbitrary code. Figure 14-2 shows memory layout when a program is run, along with descriptions of the four key areas: text, data and BSS, the stack, and the heap.
This segment contains all the compiled executable code for the program. Write permission to this segment is disabled for two reasons:
Code doesn’t contain any sort of variables, so the code has no practical reason to write over itself.
Read-only code segments can be shared between different copies of the program executing simultaneously.
In the older days of computing, code would often modify itself to increase runtime speed. Today’s modern processors are optimized for read-only code, so any modification to code only slows the processor. You can safely assume that if a program attempts to modify its own code, the attempt was unintentional.
The data and Block Started by Symbol (BSS) segments contain all the global variables for the program. These memory segments have read and write access enabled, and, in Intel architectures, data in these segments can be executed.
The stack is a region of memory used to dynamically store and manipulate most program function variables. These local variables have known sizes (such as a password buffer with a size of 128 characters), so the space is assigned and the data is manipulated in a relatively simply way. By default in most environments, data and variables on the stack can be read from, written to, and executed.
When a program enters a function, space on the stack is provided for variables and data; i.e., a stack frame is created. Each function’s stack frame contains the following:
The function’s arguments
Stack variables (the saved instruction and frame pointers)
Space for manipulation of local variables
As the size of the stack is adjusted to create this space, the processor stack pointer is incremented to point to the new end of the stack. The frame pointer points at the start of the current function stack frame. Two saved pointers are placed in the current stack frame: the saved instruction pointer and the saved frame pointer.
The saved instruction pointer is read by the processor as part of the function epilogue (when the function has exited and the space on the stack is freed up), and points the processor to the next function to be executed.
The saved frame pointer is also processed as part of the function epilogue; it defines the beginning of the parent function’s stack frame, so that logical program flow can continue cleanly.
The heap is a very dynamic area of memory and is often the largest segment of memory assigned by a program. Programs use the heap to store data that must exist after a function returns (and its variables are wiped from the stack). The data and BSS segments could be used to store the information, but this isn’t efficient, nor is it the purpose of those segments.
The allocator and deallocator algorithms manage data on the
heap. In C, these functions are called malloc( )
and free( )
. When data is to be placed in the
heap, malloc( )
is called to
allocate a chunk of memory, and when the chunk is to be unlinked,
free( )
releases the data.
Various operating systems manage heap memory in different ways, using different algorithms. Table 14-1 shows the heap implementations in use across a number of popular operating systems.
Algorithm | Operating system(s) |
GNU libc (Doug Lea) | Linux |
AT&T System V | Solaris, IRIX |
BSD (Poul-Henning Kamp) | BSDI, FreeBSD, OpenBSD |
BSD (Chris Kingsley) | 4.4BSD, Ultrix, some AIX |
Yorktown | AIX |
RtlHeap | Windows |
Most software uses standard operating system heap-management algorithms, although enterprise server packages, such as Oracle, use their own proprietary algorithms to provide better database performance.
Memory contains the following: compiled machine code for the executable program (in the text segment), global variables (in the data and BSS segments), local variables and pointers (in the stack segment), and other data (in the heap segment).
The processor reads and interprets values in memory by using
registers. A register is an
internal processor value that increments and jumps to point to memory
addresses used during program execution. Register names are different
under various processor architectures. Throughout this chapter I use
the Intel IA32 processor architecture and register names (eip
, ebp
,
and esp
in particular). Figure 14-3 shows a
high-level representation of a program executing in memory, including
these processor registers and the various memory segments.
The three important registers from a security perspective are
eip
(the instruction pointer),
ebp
(the stack frame pointer), and
esp
(the stack pointer). The stack
pointer should always point to the last address on the stack as it
grows and shrinks in size, and the stack frame pointer defines the
start of the current function’s stack frame. The instruction pointer
is an important register that points to compiled executable code
(usually in the text segment) for execution by the processor.
In Figure 14-3, the executable program code is processed from the text segment, and local variables and temporary data stored by the function exist on the stack. The heap is used for more long-term storage of data because when a function has run, its local variables are no longer referenced. Next, I’ll discuss how you can influence logical program flow by corrupting memory in these segments.
By providing malformed user input that isn’t correctly checked, you can often overwrite data outside the assigned buffer in which the data is supposed to exist. You typically do this by providing too much data to a process, which overwrites important values in memory and causes a program crash.
Depending on exactly which area of memory (stack, heap, or static segments) your input ends up in and overflows out of, you can use numerous techniques to influence the logical program flow, and often run arbitrary code.
What follows are details of the three classic classes of buffer overflows, along with details of individual overflow types. Some classes of vulnerability are easier to exploit remotely than others, which limits the options an attacker has in some cases.
Since 1988, stack overflows have led to the most serious compromises of security. Nowadays, many operating systems (including Microsoft Windows 2003 Server, OpenBSD, and various Linux distributions) have implemented nonexecutable stack protection mechanisms, and so the effectiveness of traditional stack overflow techniques is lessened.
By overflowing data on the stack, you can perform two different attacks to influence the logical program flow and execute arbitrary code:
A stack smash, overwriting the saved instruction pointer
A stack off-by-one, overwriting the saved frame pointer
These two techniques can change logical program flow, depending on the program at hand. If the program doesn’t check the length of the data provided, and simply places it into a fixed sized buffer, you can perform a stack smash. A stack off-by-one bug occurs when a programmer makes a small calculation mistake relating to lengths of strings within a program.
As stated earlier, the stack is a region of memory used for temporary storage. In C, function arguments and local variables are stored on the stack. Figure 14-4 shows the layout of the stack when a function within a program is entered.
The function allocates space at the bottom of the stack frame for local variables. Above this area in memory are the stack frame variables (the saved instruction and frame pointers), which are necessary to direct the processor to the address of the instructions to execute after this function returns.
Example 14-1 shows a simple C program that takes a user-supplied argument from the command line and prints it out.
int main(int argc, char *argv[]) { char smallbuf[32]; strcpy(smallbuf, argv[1]); printf("%s ", smallbuf); return 0; }
This main( )
function
allocates a 32-byte buffer (smallbuf) to store user input from the
command-line argument (argv[1]
).
Here is a brief example of the program being compiled and
run:
$cc -o printme printme.c
$./printme test
test
Figure 14-5
shows what the main( )
function
stack frame looks like when the strcpy(
)
function has copied the user-supplied argument into the
buffer smallbuf.
The test string is placed into smallbuf, along with a