Stack Frames

Because IDA Pro is such a low-level analysis tool, many of its features and displays expect the user to be somewhat familiar with the low-level details of compiled languages, many of which center on the specifics of generating machine language and managing the memory used by a high-level program. Therefore, from time to time this book covers some of the theory of compiled programs in order to make sense of the related IDA displays.

One such low-level concept is that of the stack frame. Stack frames are blocks of memory allocated within a program’s runtime stack and dedicated to a specific invocation of a function. Programmers typically group executable statements into units called functions (also called procedures, subroutines, or methods). In some cases this may be a requirement of the language being used. In most cases it is considered good programming practice to build programs from such functional units.

When a function is not executing, it typically requires little to no memory. When a function is called, however, it may require memory for several reasons. First, the caller of a function may wish to pass information into the function in the form of parameters (arguments), and these parameters need to be stored somewhere the function can find them. Second, the function may need temporary storage space while performing its task. This temporary space is often allocated by a programmer through the declaration of local variables, which can be used within the function but cannot be accessed once the function has completed.

Compilers utilize stack frames (also called activation records) to make the allocation and deallocation of function parameters and local variables transparent to the programmer. A compiler inserts code to place a function’s parameters into the stack frame prior to transferring control to the function itself, at which point the compiler inserts code to allocate enough memory to hold the function’s local variables. As a consequence of the way stack frames are constructed, the address to which the function should return is also stored within the new stack frame. A pleasant result of the use of stack frames is that recursion becomes possible, as each recursive call to a function is given its own stack frame, neatly segregating each call from its predecessor. The following steps detail the operations that take place when a function is called:

  1. The caller places any parameters required by the function being called into locations as dictated by the calling convention (see Calling Conventions in Calling Conventions) employed by the called function. This operation may result in a change to the program stack pointer if parameters are placed on the runtime stack.

  2. The caller transfers control to the function being called. This is usually performed with an instruction such as the x86 CALL or the MIPS JAL. A return address is typically saved onto the program stack or in a CPU register.

  3. If necessary, the called function takes steps to configure a frame pointer[42] and saves any register values that the caller expects to remain unchanged.

  4. The called function allocates space for any local variables that it may require. This is often done by adjusting the program stack pointer to reserve space on the runtime stack.

  5. The called function performs its operations, potentially generating a result. In the course of performing its operations, the called function may access the parameters passed to it by the calling function. If the function returns a result, the result is often placed into a specific register or registers that the caller can examine once the function returns.

  6. Once the function has completed its operations, any stack space reserved for local variables is released. This is often done by reversing the actions performed in step 4.

  7. Any registers whose values were saved (in step 3) on behalf of the caller are restored to their original values. This includes the restoration of the caller’s frame pointer register.

  8. The called function returns control to the caller. Typical instructions for this include the x86 RET and the MIPS JR instructions. Depending on the calling convention in use, this operation may also serve to clear one or more parameters from the program stack.

  9. Once the caller regains control, it may need to remove parameters from the program stack. In such cases a stack adjustment may be required to restore the program stack pointer to the value that it held prior to step 1.

Steps 3 and 4 are so commonly performed upon entry to a function that together they are called the function’s prologue. Similarly, steps 6 through 8 are so frequently performed at the end of a function that together they make up the function’s epilogue. With the exception of step 5, which represents the body of the function, all of these operations constitute the overhead associated with calling a function.

Calling Conventions

With a basic understanding of what stack frames are, we can take a closer look at exactly how they are structured. The examples that follow reference the x86 architecture and the behavior associated with common x86 compilers such as Microsoft Visual C/C++ or GNU’s gcc/g++. One of the most important steps in the creation of a stack frame involves the placement of function parameters onto the stack by the calling function. The calling function must store parameters exactly as the function being called expects to find them; otherwise, serious problems can arise. Functions advertise the manner in which they expect to receive their arguments by selecting and adhering to a specific calling convention.

A calling convention dictates exactly where a caller should place any parameters that a function requires. Calling conventions may require parameters to be placed in specific registers, on the program stack, or in both registers and on the stack. Equally important to when parameters are passed on the program stack is determining who is responsible for removing them from the stack once the called function has completed. Some calling conventions dictate that the caller is responsible for removing parameters that it placed on the stack, while other calling conventions dictate that the called function will take care of removing the parameters from the stack. Adherence to publicized calling conventions is essential in maintaining the integrity of the program stack pointer.

The C Calling Convention

The default calling convention used by most C compilers for the x86 architecture is called the C calling convention. The _cdecl modifier may be used by C/C++ programs to force compilers to utilize the C calling convention when the default calling convention may have been overridden. We will refer to this calling convention as the cdecl calling convention from here on. The cdecl calling convention specifies that the caller place parameters to a function on the stack in right-to-left order and that the caller (as opposed to the callee) remove the parameters from the stack after the called function completes.

One result of placing parameters on the stack in right-to-left order is that the leftmost (first) parameter of the function will always be on the top of the stack when the function is called. This makes the first parameter easy to find regardless of the number of parameters the function expects, and it makes the cdecl calling convention ideally suited for use with functions that can take a variable number of arguments (such as printf).

Requiring the calling function to remove parameters from the stack means that you will often see instructions that make an adjustment to the program stack pointer immediately following the return from a called function. In the case of functions that can accept a variable number of arguments, the caller is ideally suited to make this adjustment, as the caller knows exactly how many arguments it has chosen to pass to the function and can easily make the correct adjustment, whereas the called function never knows ahead of time how many parameters it may receive and would have a difficult time making the necessary stack adjustment.

In the following examples we consider calls to a function having the following prototype:

void demo_cdecl(int w, int x, int y, int z);

By default, this function will use the cdecl calling convention, expecting the four parameters to be pushed in right-to-left order and requiring the caller to clean the parameters off the stack. A compiler might generate code for a call to this function as follows:

; demo_cdecl(1, 2, 3, 4);   //programmer calls demo_cdecl
 push   4           ; push parameter z
  push   3           ; push parameter y
  push   2           ; push parameter x
  push   1           ; push parameter w
  call   demo_cdecl  ; call the function
 add    esp, 16     ; adjust esp to its former value

The four push operations beginning at result in a net change to the program stack pointer (ESP) of 16 bytes (4 * sizeof(int) on a 32-bit architecture), which is undone at following the return from demo_cdecl. If demo_cdecl is called 50 times, each call will be followed by an adjustment similar to that at . The following example also adheres to the cdecl calling convention while eliminating the need for the caller to explicitly clean parameters off the stack following each call to demo_cdecl.

; demo_cdecl(1, 2, 3, 4);   //programmer calls demo_cdecl
   mov   [esp+12], 4   ; move parameter z to fourth position on stack
   mov   [esp+8], 3    ; move parameter y to third position on stack
   mov   [esp+4], 2    ; move parameter x to second position on stack
   mov   [esp], 1      ; move parameter w to top of stack
   call   demo_cdecl  ; call the function

In this example, the compiler has preallocated storage space for the parameters to demo_cdecl at the top of the stack during the function prologue. When the parameters for demo_cdecl are placed on the stack, there is no change to the program stack pointer, which eliminates the need to adjust the stack pointer when the call to demo_cdecl completes. The GNU compilers (gcc and g++) utilize this technique to place function parameters onto the stack.

Note that either method results in the stack pointer pointing to the leftmost argument when the function is called.

The Standard Calling Convention

Standard in this case is a bit of a misnomer as it is a name that Microsoft created for its own calling convention marked by the use of the _stdcall modifier in a function declaration, as shown here:

void _stdcall demo_stdcall(int w, int x, int y);

In order to avoid any confusion surrounding the word standard, we will refer to this calling convention as the stdcall calling convention for the remainder of the book.

As with the cdecl calling convention, stdcall requires that function parameters be placed on the program stack in right-to-left order. The difference when using stdcall is that the called function is responsible for clearing the function parameters from the stack when the function has finished. In order for a function to do this, the function must know exactly how many parameters are on the stack. This is possible only for functions that accept a fixed number of parameters. As a result, variable argument functions such as printf cannot make use of the stdcall calling convention. The demo_stdcall function, for example, expects three integer parameters, occupying a total of 12 bytes on the stack (3 * sizeof(int) on a 32-bit architecture). An x86 compiler can use a special form of the RET instruction to simultaneously pop the return address from the top of the stack and add 12 to the stack pointer to clear the function parameters. In the case of demo_stdcall, we might see the following instruction used to return to the caller:

ret 12     ; return and clear 12 bytes from the stack

The primary advantage to the use of stdcall is the elimination of code to clean parameters off the stack following every function call, which results in slightly smaller, slightly faster programs. By convention Microsoft utilizes the stdcall convention for all fixed-argument functions exported from shared library (DLL) files. This is an important point to remember if you are attempting to generate function prototypes or binary-compatible replacements for any shared library components.

The fastcall Convention for x86

A variation on the stdcall convention, the fastcall calling convention passes up to two parameters in CPU registers rather than on the program stack. The Microsoft Visual C/C++ and GNU gcc/g++ (version 3.4 and later) compilers recognize the fastcall modifier in function declarations. When fastcall is specified, the first two parameters passed to a function will be placed in the ECX and EDX registers, respectively. Any remaining parameters are placed on the stack in right-to-left order similar to stdcall. Also similar to stdcall, fastcall functions are responsible for removing parameters from the stack when they return to their caller. The following declaration demonstrates the use of the fastcall modifier.

void fastcall demo_fastcall(int w, int x, int y, int z);

A compiler might generate the following code in order to call demo_fastcall:

; demo_fastcall(1, 2, 3, 4);   //programmer calls demo_fastcall
   push   4              ; move parameter z to second position on stack
   push   3              ; move parameter y to top position on stack
   mov    edx, 2         ; move parameter x to edx
   mov    ecx, 1         ; move parameter w to ecx
   call   demo_fastcall  ; call the function

Note that no stack adjustment is required upon return from the call to demo_fastcall, as demo_fastcall is responsible for clearing parameters y and z from the stack as it returns to the caller. It is important to understand that because two arguments are passed in registers, the called function needs to clear only 8 bytes from the stack even though there are four arguments to the function.

C++ Calling Conventions

Nonstatic member functions in C++ classes differ from standard functions in that they must make available the this pointer, which points to the object used to invoke the function. The address of the object used to invoke the function must be supplied by the caller and is therefore provided as a parameter when calling nonstatic member functions. The C++ language standard does not specify how this should be passed to nonstatic member functions, so it should come as no surprise that different compilers use different techniques when passing this.

Microsoft Visual C++ offers the thiscall calling convention, which passes this in the ECX register and requires the nonstatic member function to clean parameters off the stack as in stdcall. The GNU g++ compiler treats this as the implied first parameter to any nonstatic member function and behaves in all other respects as if the cdecl convention is being used. Thus, for g++-compiled code, this is placed on top of the stack prior to calling the nonstatic member function, and the caller is responsible for removing parameters (there will always be at least one) from the stack once the function returns. Additional features of compiled C++ are discussed in Chapter 8.

Other Calling Conventions

Complete coverage of every existing calling convention would require a book in its own right. Calling conventions are often language-, compiler-, and CPU-specific, and some research on your part may be required as you encounter code generated by less-common compilers. A few situations deserve special mention, however: optimized code, custom assembly language code, and system calls.

When functions are exported for use by other programmers (such as library functions), it is important that they adhere to well-known calling conventions so that programmers can easily interface to those functions. On the other hand, if a function is intended for internal program use only, then the calling convention used by that function need be known only within that function’s program. In such cases, optimizing compilers may choose to use alternate calling conventions in order to generate faster code. Instances in which this may occur include the use of the /GL option with Microsoft Visual C++ and the use of the regparm keyword with GNU gcc/g++.

When programmers go to the trouble of using assembly language, they gain complete control over how parameters will be passed to any functions that they happen to create. Unless they wish to make their functions available to other programmers, assembly language programmers are free to pass parameters in any way they see fit. As a result, you may need to take extra care when analyzing custom assembly code. Custom assembly code is often encountered in obfuscation routines and shellcode.

A system call is a special type of function call used to request an operating system service. System calls usually effect a state transition from user mode to kernel mode in order for the operating system kernel to service the user’s request. The manner in which system calls are initiated varies across operating systems and CPUs. For example, Linux x86 system calls may be initiated using the int 0x80 instruction or the sysenter instruction, while other x86 operating systems may use only the sysenter instruction or alternate interrupt numbers. On many x86 systems (Linux being an exception) parameters for system calls are placed on the runtime stack, and a system call number is placed in the EAX register immediately prior to initiating the system call. Linux system calls accept their parameters in specific registers and occasionally in memory when there are more parameters than available registers.

Local Variable Layout

Unlike the calling conventions that dictate the manner in which parameters are passed into a function, there are no conventions that mandate the layout of a function’s local variables. When compiling a function, one task a compiler is faced with is to compute the amount of space required by a function’s local variables. Another task is to determine whether those variables can be allocated in CPU registers or whether they must be allocated on the program stack. The exact manner in which these allocations are made is irrelevant to both the caller of a function and to any functions that may, in turn, be called. Most notably, it is typically impossible to determine a function’s local variable layout based on examination of the function’s source code.

Stack Frame Examples

Consider the following function compiled on a 32-bit x86-based computer:

void bar(int j, int k);   // a function to call
void demo_stackframe(int a, int b, int c) {
   int x;
   char buffer[64];
   int y;
   int z;
   // body of function not terribly relevant other than
   bar(z, y);
}

We compute the minimum amount of stack space required for local variables as 76 bytes (three 4-byte integers and a 64-byte buffer). This function could use either stdcall or cdecl, and the stack frame will look the same. Figure 6-3 shows one possible implementation of a stack frame for an invocation of demo_stackframe, assuming that no frame pointer register is used (thus the stack pointer, ESP, serves as the frame pointer). This frame would be set up on entry to demo_stackframe with the one-line prologue:

sub   esp, 76     ; allocate sufficient space for all local variables

The Offset column indicates the base+displacement address required to reference any of the local variables or parameters in the stack frame.

An ESP-based stack frame

Figure 6-3. An ESP-based stack frame

Generating functions that utilize the stack pointer to compute all variable references requires a little more effort on the part of the compiler, as the stack pointer changes frequently and the compiler must make sure that proper offsets are used at all times when referencing any variables within the stack frame. Consider the call made to bar in function demo_stackframe, the code for which is shown here:

 push   dword [esp+4]     ; push y
 push   dword [esp+4]     ; push z
  call   bar
  add    esp, 8              ; cdecl requires caller to clear parameters

The push at correctly pushes local variable y per the offset in Figure 6-3. At first glance it might appear that the push at incorrectly references local variable y a second time. However, because we are dealing with an ESP-based frame and the push at modifies ESP, all of the offsets in Figure 6-3 must be temporarily adjusted each time ESP changes. Following , the new offset for local variable z becomes [esp+4] as correctly referenced in the push at . When examining functions that reference stack frame variables using the stack pointer, you must be careful to note any changes to the stack pointer and adjust all future variable offsets accordingly. One advantage of using the stack pointer to reference all stack frame variables is that all other registers remain available for other purposes.

Once demo_stackframe has completed, it needs to return to the caller. Ultimately a ret instruction will be used to pop the desired return address off the top of the stack into the instruction pointer register (EIP in this case). Before the return address can be popped, the local variables need to be removed from the top of the stack so that the stack pointer correctly points to the saved return address when the ret instruction is executed. For this particular function the resulting epilogue becomes

add     esp, 76     ; adjust esp to point to the saved return address
ret                 ; return to the caller

At the expense of dedicating a register for use as a frame pointer and some code to configure the frame pointer on entry to the function, the job of computing local variable offsets can be made easier. In x86 programs, the EBP (extended base pointer) register is typically dedicated for use as a stack frame pointer. By default, most compilers generate code to use a frame pointer, though options typically exist for specifying that the stack pointer should be used instead. GNU gcc/g++, for example, offers the -fomit-frame-pointer compiler option, which generates functions that do not rely on a fixed-frame pointer register.

In order to see what the stack frame for demo_stackframe will look like using a dedicated frame pointer, we need to consider this new prologue code:

 push    ebp        ; save the caller's ebp value
 mov     ebp, esp   ; make ebp point to the saved register value
 sub     esp, 76    ; allocate space for local variables

The push instruction at saves the value of EBP currently being used by the caller. Functions that adhere to the System V Application Binary Interface for Intel 32-bit Processors[43] are allowed to modify the EAX, ECX, and EDX registers but are required to preserve the caller’s values for all other registers. Therefore, if we wish to use EBP as a frame pointer, we must save the current value of EBP before we change it, and we must restore the value of EBP before we return to the caller. If any other registers need to be saved on behalf of the caller (ESI or EDI, for example), compilers may choose to save them at the same time EBP is saved, or they may defer saving them until local variables have been allocated. Thus, there is no standard location within a stack frame for the storage of saved registers.

Once EBP has been saved, it can be changed to point to the current stack location. This is accomplished by the mov instruction at , which copies the current value of the stack pointer into EBP. Finally, as in the non-EBP-based stack frame, space for local variables is allocated at . The resulting stack frame layout is shown in Figure 6-4.

An EBP-based stack frame

Figure 6-4. An EBP-based stack frame

With a dedicated frame pointer, all variable offsets are computed relative to the frame pointer register. It is most often (though not necessarily) the case that positive offsets are used to access function parameters, while negative offsets are required to access local variables. With a dedicated frame pointer in use, the stack pointer may be freely changed without affecting the offset to any variables within the frame. The call to function bar can now be implemented as follows:

 push   dword [ebp-72]       ; push y
  push   dword [ebp-76]       ; push z
  call   bar
  add    esp, 8               ; cdecl requires caller to clear parameters

The fact that the stack pointer has changed following the push at has no effect on the access to local variable z in the succeeding push.

Finally, the use of a frame pointer necessitates a slightly different epilogue once the function completes, as the caller’s frame pointer must be restored prior to returning. Local variables must be cleared from the stack before the old value of the frame pointer can be retrieved, but this is made easy by the fact that the current frame pointer points to the old frame pointer. In x86 programs utilizing EBP as a frame pointer, the following code represents a typical epilogue:

mov    esp, ebp      ; clears local variables by reseting esp
pop    ebp           ; restore the caller's value of ebp
ret                  ; pop return address to return to the caller

This operation is so common that the x86 architecture offers the leave instruction as an abbreviated means of accomplishing the same task.

leave                ; copies ebp to esp AND then pops into ebp
ret                  ; pop return address to return to the caller

While the names of registers and instructions used will certainly differ for other processor architectures, the basic process of building stack frames will remain the same. Regardless of the architecture, you will want to familiarize yourself with typical prologue and epilogue sequences so that you can quickly move on to analyzing more interesting code within functions.

IDA Stack Views

Stack frames are clearly a runtime concept; a stack frame can’t exist without a stack and without a running program. While this is true, it doesn’t mean that you should ignore the concept of a stack frame when you are performing static analysis with tools such as IDA. All of the code required to set up stack frames for each function is present within a binary. Through careful analysis of this code, we can gain a detailed understanding of the structure of any function’s stack frame even when the function is not running. In fact, some of IDA’s most sophisticated analysis is performed specifically to determine the layout of stack frames for every function that IDA disassembles. During initial analysis, IDA goes to great lengths to monitor the behavior of the the stack pointer over the course of a function by making note of every push or pop operation along with any arithmetic operations that may change the stack pointer, such as adding or subtracting constant values. The first goal of this analysis is to determine the exact size of the local variable area allocated to a function’s stack frame. Additional goals include determining whether a dedicated frame pointer is in use in a given function (by recognizing a push ebp/mov ebp, esp sequence, for example) and recognizing all memory references to variables within a function’s stack frame. For example, if IDA noted the following instruction in the body of demo_stackframe

mov    eax, [ebp+8]

it would understand that the first argument to the function (a in this case) is being loaded into the EAX register (refer to Figure 6-4). Through careful analysis of the stack frame structure, IDA can distinguish between memory references that access function arguments (those that lie below the saved return address) and references that access local variables (those that lie above the saved return address). IDA takes the additional step of determining which memory locations within a stack frame are directly referenced. For example, while the stack frame in Figure 6-4 is 96 bytes in size, there are only seven variables that we are likely to see referenced (four locals and three parameters).

Understanding the behavior of a function often comes down to understanding the types of data that the function manipulates. When reading a disassembly listing, one of the first opportunities that you will have to understand the data a function manipulates is to view the breakdown of the function’s stack frame. IDA offers two views into any function’s stack frame: a summary view and a detail view. In order to understand these two views, we will refer to the following version of demo_stackframe, which we have compiled using gcc.

void demo_stackframe(int a, int b, int c) {
   int x = c;
   char buffer[64];
   int y = b;
   int z = 10;
   buffer[0] = 'A';
   bar(z, y);
}

In this example, local variables x and y are initialized from parameters c and b, respectively. Local variable z is initialized with the constant value 10, and the first character in the 64-byte local array, named buffer, is initialized to the letter 'A'. The corresponding IDA disassembly of this function appears here.

.text:00401090 ; ========= S U B R O U T I N E ===========================
    .text:00401090
    .text:00401090 ; Attributes: bp-based frame
    .text:00401090
    .text:00401090 demo_stackframe proc near      ; CODE XREF: sub_4010C1+41↓p
    .text:00401090
   .text:00401090 var_60          = dword ptr −60h
    .text:00401090 var_5C          = dword ptr −5Ch
    .text:00401090 var_58          = byte ptr −58h
    .text:00401090 var_C           = dword ptr −0Ch
    .text:00401090 arg_4           = dword ptr  0Ch
    .text:00401090 arg_8           = dword ptr  10h
    .text:00401090
    .text:00401090                 push    ebp
    .text:00401091                 mov     ebp, esp
    .text:00401093                 sub     esp, 78h
    .text:00401096                 mov     eax, [ebp+arg_8]
      .text:00401099                mov     [ebp+var_C], eax
    .text:0040109C                mov     eax, [ebp+arg_4]
    .text:0040109F                mov     [ebp+var_5C], eax
    .text:004010A2                mov     [ebp+var_60], 0Ah
    .text:004010A9                mov     [ebp+var_58], 41h
    .text:004010AD                 mov     eax, [ebp+var_5C]
    .text:004010B0                mov     [esp+4], eax
    .text:004010B4                 mov     eax, [ebp+var_60]
    .text:004010B7                mov     [esp], eax
    .text:004010BA                 call    bar
    .text:004010BF                 leave
    .text:004010C0                 retn
    .text:004010C0 demo_stackframe endp

There are many points to cover in this listing as we begin to acquaint ourselves with IDA’s disassembly notation. We begin at by noting that IDA believes this function uses the EBP register as a frame pointer based on analysis of the function prologue. At we learn that gcc has allocated 120 bytes (78h equates to 120) of local variable space in the stack frame. This includes 8 bytes for passing the two parameters to bar at , but it is still far greater than the 76 bytes we had estimated previously and demonstrates that compilers occasionally pad the local variable space with extra bytes in order to ensure a particular alignment within the stack frame. Beginning at , IDA provides a summary stack view that lists every variable that is directly referenced within the stack frame, along with the variable’s size and offset distance from the frame pointer.

IDA assigns names to variables based on their location relative to the saved return address. Local variables lie above the saved return address, while function parameters lie below the saved return address. Local variable names are derived using the var_ prefix joined with a hexadecimal suffix that indicates the distance, in bytes, that the variable lies above the saved frame pointer. Local variable var_C, in this case, is a 4-byte (dword) variable that lies 12 bytes above the saved frame pointer ([ebp-0Ch]). Function parameter names are generated using the arg_ prefix combined with a hexadecimal suffix that represents the relative distance from the topmost parameter. Thus the topmost 4-byte parameter would be named arg_0, while successive parameters would be named arg_4, arg_8, arg_C, and so on. In this particular example arg_0 is not listed because the function makes no use of parameter a. Because IDA fails to locate any memory reference to [ebp+8] (the location of the first parameter), arg_0 is not listed in the summary stack view. A quick scan of the summary stack view reveals that there are many stack locations that IDA has failed to name because no direct references to those locations exist in the program code.

Note

The only stack variables that IDA will automatically generate names for are those that are directly referenced within a function.

An important difference between IDA’s disassembly listing and the stack frame analysis that we performed earlier is the fact that nowhere in the disassembly listing do we see memory references similar to [ebp-12]. Instead, IDA has replaced all constant offsets with symbolic names corresponding to the symbols in the stack view and their relative offsets from the stack frame pointer. This is in keeping with IDA’s goal of generating a higher-level disassembly. It is simply easier to deal with symbolic names than numeric constants. In fact, as we will see later, IDA allows us to change the names of any stack variable to whatever we wish, making the names that much easier for us to remember. The summary stack view serves as a map from IDA-generated names to their corresponding stack frame offsets. For example, where the memory reference [ebp+arg_8] appears in the disassembly, [ebp+10h] or [ebp+16] could be used instead. If you prefer numeric offsets, IDA will happily show them to you. Right-clicking arg_8 at yields the context-sensitive menu shown in Figure 6-5, which contains several options to change the display format.

Selecting an alternate display format

Figure 6-5. Selecting an alternate display format

In this example, since we have source code available for comparison, we can map the IDA-generated variable names back to the names used in the original source using a variety of clues available in the disassembly.

  1. First, demo_stackframe takes three parameters: a, b, and c. These correspond to variables arg_0, arg_4, and arg_8 respectively (though arg_0 is missing in the disassembly because it is never referenced).

  2. Local variable x is initialized from parameter c. Thus var_C corresponds to x since it is initialized from arg_8 at .

  3. Similarly, local variable y is initialized from parameter b. Thus, var_5C corresponds to y since it is initialized from arg_4 at .

  4. Local variable z corresponds to var_60 since it is initialized with the value 10 at .

  5. The 64-byte character array buffer begins at var_58 since buffer[0] is initialized with A (ASCII 0x41) at .

  6. The two arguments for the call to bar are moved into the stack at rather than being pushed onto the stack. This is typical of current versions of gcc (versions 3.4 and later). IDA recognizes this convention and elects not to create local variable references for the two items at the top of the stack frame.

In addition to the summary stack view, IDA offers a detailed stack frame view in which every byte allocated to a stack frame is accounted for. The detailed view is accessed by double-clicking any variable name associated with a given stack frame. Double-clicking var_C in the previous listing would bring up the stack frame view shown in Figure 6-6 (esc closes the window).

IDA stack frame view

Figure 6-6. IDA stack frame view

Because the detailed view accounts for every byte in the stack frame, it occupies significantly more space than the summary view, which lists only referenced variables. The portion of the stack frame shown in Figure 6-6 spans a total of 32 bytes, which represents only a small portion of the entire stack frame. Note that no names are assigned to bytes that are not referenced directly within the function. For example, parameter a, corresponding to arg_0, was never referenced within demo_stackframe. With no memory reference to analyze, IDA opts to do nothing with the corresponding bytes in the stack frame, which occupy offsets +00000008 through +0000000B. On the other hand, arg_4 was directly referenced at in the disassembly listing, where its contents were loaded into the 32-bit EAX register. Based on the fact that 32 bits of data were moved, IDA is able to infer that the arg_4 is a 4-byte quantity and labels it as such (db defines 1 byte of storage; dw defines 2 bytes of storage, also called a word; and dd defines 4 bytes of storage, also called a double word).

Two special values shown in Figure 6-6 are “ s” and “ r” (each starts with a leading space). These pseudo variables are IDA’s special representation of the saved return address (“ r”) and the saved register value(s) (“ s” representing only EBP in this example). These values are included in the stack frame view for completeness, as every byte in the stack frame is accounted for.

Stack frame view offers a detailed look at the inner workings of compilers. In Figure 6-6 it is clear that the compiler has inserted 8 extra bytes between the saved frame pointer “ s” and the local variable x (var_C). These bytes occupy offsets −00000001 through −00000008 in the stack frame. Further, a little math performed on the offset associated with each variable listed in the summary view reveals that the compiler has allocated 76 (rather than 64 per the source code) bytes to the character buffer at var_58. Unless you happen to be a compiler writer yourself or are willing to dig deep into the source code for gcc, all you can do is speculate as to why these extra bytes are allocated in this manner. In most cases we can chalk up the extra bytes to padding for alignment, and usually the presence of these extra bytes has no impact on a program’s behavior. After all, if a programmer asks for 64 bytes and is given 76, the program should behave no differently, especially since the programmer shouldn’t be using more than the 64 bytes requested. On the other hand, if you happen to be an exploit developer and learn that it is possible to overflow this particular buffer, then you might be very interested in the fact that nothing interesting can even begin to happen until you have supplied at least 76 bytes, which is the effective size of the buffer as far as the compiler is concerned. In Chapter 8 we will return to the stack frame view and its uses in dealing with more complex datatypes such as arrays and structures.



[42] A frame pointer is a register that points to a location inside a stack frame. Variables within the stack frame are typically referenced by their relative distance from the location to which the frame pointer points.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.40.32