The low-hanging fruit in understanding the behavior of binary programs lies in cataloging the library functions that the program calls. A C program that calls the connect
function is creating a network connection. A Windows program that calls RegOpenKey
is accessing the Windows registry. Additional analysis is required, however, to gain an understanding of how and why these functions are called.
Discovering how a function is called requires learning what parameters are passed to the function. In the case of a connect
call, beyond the simple fact that the function is being called, it is important to know exactly what network address the program is connecting to. Understanding the data that is being passed into functions is the key to reverse engineering a function’s signature (the number, type, and sequence of parameters required by the function) and, as such, points out the importance of understanding how datatypes and data structures are manipulated at the assembly language level.
In this chapter we will examine how IDA conveys datatype information to the user, how data structures are stored in memory, and how data within those data structures is accessed. The simplest method for associating a specific datatype with a variable is to observe the use of the variable as a parameter to a function that we know something about. During its analysis phase, IDA makes every effort to annotate datatypes when they can be deduced based on a variable’s use with a function for which IDA possesses a prototype. When possible, IDA will go as far as using a formal parameter name lifted from a function prototype rather than generating a default dummy name for the variable. This can be seen in the following disassembly of a call to connect
:
.text:004010F3 push 10h ; namelen .text:004010F5 lea ecx, [ebp+name] .text:004010F8 push ecx ; name .text:004010F9 mov edx, [ebp+s] .text:004010FF push edx ; s .text:00401100 call connect
In this listing we can see that each push
has been commented with the name of the parameter that is being pushed (taken from IDA’s knowledge of the function prototype). In addition, two local stack variables have been named for the parameters that they correspond to. In most cases, these names will be far more informative than the dummy names that IDA would otherwise generate.
IDA’s ability to propagate type information from function prototypes is not limited to library functions contained in IDA’s type libraries. IDA can propagate formal parameter names and data types from any function in your database as long as you have explicitly set the function’s type information. Upon initial analysis, IDA assigns dummy names and the generic type int
to all function arguments, unless through type propagation it has reason to do otherwise. In any case, you must set a function’s type by using the Edit ▸ Functions ▸ Set Function Type command, right-clicking on a function name, and choosing Set Function Type on the context menu or using the Y hotkey. For the function shown below, this results in the dialog shown in Figure 8-1, in which you may enter the function’s correct prototype.
.text:00401050 ; ======== S U B R O U T I N E ========================= .text:00401050 .text:00401050 ; Attributes: bp-based frame .text:00401050 .text:00401050 foo proc near ; CODE XREF: demo_stackframe+2A↓p .text:00401050 .text:00401050 arg_0 = dword ptr 8 .text:00401050 arg_4 = dword ptr 0Ch .text:00401050 .text:00401050 push ebp .text:00401051 mov ebp, esp
As shown below, IDA assumes an int
return type, correctly deduces that the cdecl
calling convention is used based on the type of ret
instruction used, incorporates the name of the function as we have modified it, and assumes all parameters are of type int
. Because we have not yet modified the argument names, IDA displays only their types.
If we modify the prototype to read int __cdecl foo(float f, char *ptr)
, IDA will automatically insert a prototype comment for the function and change the argument names in the disassembly as shown below.
.text:00401050 ; ======== S U B R O U T I N E ========================= .text:00401050 .text:00401050 ; Attributes: bp-based frame .text:00401050 .text:00401050 ; int __cdecl foo(float f, char *ptr) .text:00401050 foo proc near ; CODE XREF: demo_stackframe+2A↓p .text:00401050 .text:00401050 f = dword ptr 8 .text:00401050 ptr = dword ptr 0Ch .text:00401050 .text:00401050 push ebp .text:00401051 mov ebp, esp
Finally, IDA propagates this information to all callers of the newly modified function, resulting in improved annotation of all related function calls as shown here. Note that the argument names f
and ptr
have been propagated out as comments in the calling function and used to rename variables that formerly used dummy names.
.text:004010AD mov eax, [ebp+ptr] .text:004010B0 mov [esp+4], eax ; ptr .text:004010B4 mov eax, [ebp+f] .text:004010B7 mov [esp], eax ; f .text:004010BA call foo
Returning to imported library functions, it is often the case that IDA will already know the prototype of the function. In such cases, you can easily view the prototype by holding the mouse over the function name.[44] When IDA has no knowledge of a function’s parameter sequence, it should, at a minimum, know the name of the library from which the function was imported (see the Imports window). When this happens, your best resources for learning the behavior of the function are any associated man pages or other available API documentation (such as MSDN online[45]). When all else fails, remember the adage: Google is your friend.
For the remainder of this chapter, we will be discussing how to recognize when data structures are being used in a program, how to decipher the organizational layout of such structures, and how to use IDA to improve the readability of a disassembly when such structures are in use. Since C++ classes are a complex extension of C structures, the chapter concludes with a discussion of reverse engineering compiled C++ programs.
While primitive datatypes are often a natural fit with the size of a CPU’s registers or instruction operands, composite datatypes such as arrays and structures typically require more complex instruction sequences in order to access the individual data items that they contain. Before we can discuss IDA’s feature for improving the readability of code that utilizes complex datatypes, we need to review what that code looks like.
Arrays are the simplest composite data structure in terms of memory layout. Traditionally, arrays are contiguous blocks of memory that contain consecutive elements of the same datatype. The size of an array is easy to compute, as it is the product of the number of elements in the array and the size of each element. Using C notation, the minimum number of bytes consumed by the following array
int array_demo[100];
is computed as
int bytes = 100 * sizeof(int);
Individual array elements are accessed by supplying an index value, which may be a variable or a constant, as shown in these array references:
array_demo[20] = 15; //fixed index into the array for (int i = 0; i < 100; i++) { array_demo[i] = i; //varying index into the array }
Assuming, for the sake of example, that sizeof(int)
is 4 bytes, then the first array access at accesses the integer value that lies 80 bytes into the array, while the second array access at accesses successive integers at offsets 0, 4, 8, .. 96 bytes into the array. The offset for the first array access can be computed at compile time as 20 * 4
. In most cases, the offset for the second array access must be computed at runtime because the value of the loop counter, i
, is not fixed at compile time. Thus for each pass through the loop, the product i * 4
must be computed to determine the exact offset into the array. Ultimately, the manner in which an array element is accessed depends not only on the type of index used but also on where the array happens to be allocated within the program’s memory space.
When an array is allocated within the global data area of a program (within the .data
or .bss
section, for example), the base address of the array is known to the compiler at compile time. The fixed base address makes it possible for the compiler to compute fixed addresses for any array element that is accessed using a fixed index. Consider the following trivial program that accesses a global array using both fixed and variable offsets:
int global_array[3]; int main() { int idx = 2; global_array[0] = 10; global_array[1] = 20; global_array[2] = 30; global_array[idx] = 40; }
This program disassembles to the following:
.text:00401000 _main proc near .text:00401000 .text:00401000 idx = dword ptr −4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push ecx .text:00401004 mov [ebp+idx], 2 .text:0040100B mov dword_40B720, 10 .text:00401015 mov dword_40B724, 20 .text:0040101F mov dword_40B728, 30 .text:00401029 mov eax, [ebp+idx] .text:0040102C mov dword_40B720[eax*4], 40 .text:00401037 xor eax, eax .text:00401039 mov esp, ebp .text:0040103B pop ebp .text:0040103C retn .text:0040103C _main endp
While this program has only one global variable, the disassembly lines at , , and seem to indicate that there are three global variables. The computation of an offset (eax * 4
) at is the only thing that seems to hint at the presence of a global array named dword_40B720
, yet this is the same name as the global variable found at .
Based on the dummy names assigned by IDA, we know that the global array is made up of the 12 bytes beginning at address 0040B720
. During the compilation process, the compiler has used the fixed indexes (0, 1, 2) to compute the actual addresses of the corresponding elements in the array (0040B720
, 0040B724
, and 0040B728
), which are referenced using the global variables at , , and . Using IDA’s array-formatting operations discussed in the last chapter (Edit ▸ Array), dword_40B720
can be formatted as a three-element array yielding the alternate disassembly lines shown in the following listing. Note that this particular formatting highlights the use of offsets into the array:
.text:0040100B mov dword_40B720, 10 .text:00401015 mov dword_40B720+4, 20 .text:0040101F mov dword_40B720+8, 30
There are two points to note in this example. First, when constant indexes are used to access global arrays, the corresponding array elements will appear as global variables in the corresponding disassembly. In other words, the disassembly will offer essentially no evidence that an array exists. The second point is that the use of variable index values leads us to the start of the array because the base address will be revealed (as in ) when the computed offset is added to it to compute the actual array location to be accessed. The computation at offers one additional piece of significant information about the array. By observing the amount by which the array index is multiplied (4 in this case), we learn the size (though not the type) of an individual element in the array.
How does array access differ if the array is allocated as a stack variable instead? Instinctively, we might think that it must be different since the compiler can’t know an absolute address at compile time, so surely even accesses that use constant indexes must require some computation at runtime. In practice, however, compilers treat stack-allocated arrays almost identically to globally allocated arrays.
Consider the following program that makes use of a small stack-allocated array:
int main() { int stack_array[3]; int idx = 2; stack_array[0] = 10; stack_array[1] = 20; stack_array[2] = 30; stack_array[idx] = 40; }
The address at which stack_array
will be allocated is unknown at compile time, so it is not possible for the compiler to precompute the address of stack_array[1]
at compile time as it did in the global array example. By examining the disassembly listing for this function, we gain insight into how stack-allocated arrays are accessed:
.text:00401000 _main proc near .text:00401000 .text:00401000 var_10 = dword ptr −10h .text:00401000 var_C = dword ptr −0Ch .text:00401000 var_8 = dword ptr −8 .text:00401000 idx = dword ptr −4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 10h .text:00401006 mov [ebp+idx], 2 .text:0040100D mov [ebp+var_10], 10 .text:00401014 mov [ebp+var_C], 20 .text:0040101B mov [ebp+var_8], 30 .text:00401022 mov eax, [ebp+idx] .text:00401025 mov [ebp+eax*4+var_10], 40 .text:0040102D xor eax, eax .text:0040102F mov esp, ebp .text:00401031 pop ebp .text:00401032 retn .text:00401032 _main endp
As with the global array example, this function appears to have three variables (var_10
, var_C
, and var_8
) rather than an array of three integers. Based on the constant operands used at , , and , we know that what appear to be local variable references are actually references to the three elements of stack_array
whose first element must reside at var_10
, the local variable with the lowest memory address.
To understand how the compiler resolved the references to the other elements of the array, consider what the compiler goes through when dealing with the reference to stack_array[1]
, which lies 4 bytes into the array, or 4 bytes beyond the location of var_10
. Within the stack frame, the compiler has elected to allocate stack_array
at ebp - 0x10
. The compiler understands that stack_array[1]
lies at ebp - 0x10 + 4
, which simplifies to ebp - 0x0C
. The result is that IDA displays this as a local variable reference. The net effect is that, similar to globally allocated arrays, the use of constant index values tends to hide the presence of a stack-allocated array. Only the array access at hints at the fact that var_10
is the first element in the array rather than a simple integer variable. In addition, the disassembly line at also helps us conclude that the size of individual elements in the array is 4 bytes.
Stack-allocated arrays and globally allocated arrays are thus treated very similarly by compilers. However, there is an extra piece of information that we can attempt to extract from the disassembly of the stack example. Based on the location of idx
within the stack, it is possible to conclude that the array that begins with var_10
contains no more than three elements (otherwise, it would overwrite idx
). If you are an exploit developer, this can be very useful in determining exactly how much data you can fit into an array before you overflow it and begin to corrupt the data that follows.
Heap-allocated arrays are allocated using a dynamic memory allocation function such as malloc
(C) or new
(C++). From the compiler’s perspective, the primary difference in dealing with a heap-allocated array is that the compiler must generate all references into the array based on the address value returned from the memory allocation function. For the sake of comparison, we now take a look at the following function, which allocates a small array in the program heap:
int main() { int *heap_array = (int*)malloc(3 * sizeof(int)); int idx = 2; heap_array[0] = 10; heap_array[1] = 20; heap_array[2] = 30; heap_array[idx] = 40; }
In studying the corresponding disassembly that follows, you should notice a few similarities and differences with the two previous disassemblies:
.text:00401000 _main proc near .text:00401000 .text:00401000 heap_array = dword ptr −8 .text:00401000 idx = dword ptr −4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 8 .text:00401006 push 0Ch ; size_t .text:00401008 call _malloc .text:0040100D add esp, 4 .text:00401010 mov [ebp+heap_array], eax .text:00401013 mov [ebp+idx], 2 .text:0040101A mov eax, [ebp+heap_array] .text:0040101D mov dword ptr [eax], 10 .text:00401023 mov ecx, [ebp+heap_array] .text:00401026 mov dword ptr [ecx+4], 20 .text:0040102D mov edx, [ebp+heap_array] .text:00401030 mov dword ptr [edx+8], 30 .text:00401037 mov eax, [ebp+idx] .text:0040103A mov ecx, [ebp+heap_array] .text:0040103D mov dword ptr [ecx+eax*4], 40 .text:00401044 xor eax, eax .text:00401046 mov esp, ebp .text:00401048 pop ebp .text:00401049 retn .text:00401049 _main endp
The starting address of the array (returned from malloc
in the EAX register) is stored in the local variable heap_array
. In this example, unlike the previous examples, every access to the array begins with reading the contents of heap_array
to obtain the array’s base address before an offset value can be added to compute the address of the correct element within the array. The references to heap_array[0]
, heap_array[1]
, and heap_array[2]
require offsets of 0, 4, and 8 bytes, respectively, as seen at , , and . The operation that most closely resembles the previous examples is the reference to heap_array[idx]
at , in which the offset into the array continues to be computed by multiplying the array index by the size of an array element.
Heap-allocated arrays have one particularly nice feature. When both the total size of the array and the size of each element can be determined, it is easy to compute the number of elements allocated to the array. For heap-allocated arrays, the parameter passed to the memory allocation function (0x0C
passed to malloc
at ) represents the total number of bytes allocated to the array. Dividing this by the size of an element (4 bytes in this example, as observed from the offsets at , , and ) tells us the number of elements in the array. In the previous example, a three-element array was allocated.
The only firm conclusion we can draw regarding the use of arrays is that they are easiest to recognize when a variable is used as an index into the array. The array-access operation requires the index to be scaled by the size of an array element before adding the resulting offset to the base address of the array. Unfortunately, as we will show in the next section, when constant index values are used to access array elements, they do little to suggest the presence of an array and look remarkably similar to code used to access structure members.
C-style structs, referred to here generically as structures, are heterogeneous collections of data that allow grouping of items of dissimilar datatypes into a single composite datatype. A major distinguishing feature of structures is that the data fields within a structure are accessed by name rather than by index, as is done with arrays. Unfortunately, field names are converted to numeric offsets by the compiler, so by the time you are looking at a disassembly, structure field access looks remarkably similar to accessing array elements using constant indexes.
When a compiler encounters a structure definition, the compiler maintains a running total of the number of bytes consumed by the fields of the structure in order to determine the offset at which each field resides within the structure. The following structure definition will be used with the upcoming examples:
struct ch8_struct { //Size Minimum offset Default offset int field1; // 4 0 0 short field2; // 2 4 4 char field3; // 1 6 6 int field4; // 4 7 8 double field5; // 8 11 16 }; //Minimum total size: 19 Default size: 24
The minimum required space to allocate a structure is determined by the sum of the space required to allocate each field within the structure. However, you should never assume that a compiler utilizes the minimum required space to allocate a structure. By default, compilers seek to align structure fields to memory addresses that allow for the most efficient reading and writing of those fields. For example, 4-byte integer fields will be aligned to offsets that are divisible by 4, while 8-byte doubles will be aligned to offsets that are divisible by 8. Depending on the composition of the structure, meeting alignment requirements may require the insertion of padding bytes, causing the actual size of a structure to be larger than the sum of its component fields. The default offsets and resulting structure size for the example structure shown previously can be seen in the Default offset
column.
Structures can be packed into the minimum required space by using compiler options to request specific member alignments. Microsoft Visual C/C++ and GNU gcc/g++ both recognize the pack
pragma as a means of controlling structure field alignment. The GNU compilers additionally recognize the packed
attribute as a means of controlling structure alignment on a per-structure basis. Requesting 1-byte alignment for structure fields causes compilers to squeeze the structure into the minimum required space. For our example structure, this yields the offsets and structure size found in the Minimum offset
column. Note that some CPUs perform better when data is aligned according to its type, while other CPUs may generate exceptions if data is not aligned on specific boundaries.
With these facts in mind, we can begin our look at how structures are treated in compiled code. For the sake of comparison, it is worth observing that, as with arrays, access to structure members is performed by adding the base address of the structure to the offset of the desired member. However, while array offsets can be computed at runtime from a provided index value (because each item in an array has the same size), structure offsets must be precomputed and will turn up in compiled code as fixed offsets into the structure, looking nearly identical to array references that make use of constant indexes.
As with globally allocated arrays, the address of globally allocated structures is known at compile time. This allows the compiler to compute the address of each member of the structure at compile time and eliminates the need to do any math at runtime. Consider the following program that accesses a globally allocated structure:
struct ch8_struct global_struct; int main() { global_struct.field1 = 10; global_struct.field2 = 20; global_struct.field3 = 30; global_struct.field4 = 40; global_struct.field5 = 50.0; }
If this program is compiled with default structure alignment options, we can expect to see something like the following when we disassemble it:
.text:00401000 _main proc near .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 mov dword_40EA60, 10 .text:0040100D mov word_40EA64, 20 .text:00401016 mov byte_40EA66, 30 .text:0040101D mov dword_40EA68, 40 .text:00401027 fld ds:dbl_40B128 .text:0040102D fstp dbl_40EA70 .text:00401033 xor eax, eax .text:00401035 pop ebp .text:00401036 retn .text:00401036 _main endp
This disassembly contains no math whatsoever to access the members of the structure, and, in the absence of source code, it would not be possible to state with any certainty that a structure is being used at all. Because the compiler has performed all of the offset computations at compile time, this program appears to reference five global variables rather than five fields within a single structure. You should be able to note the similarities with the previous example regarding globally allocated arrays using constant index values.
Like stack-allocated arrays (see Stack-Allocated Arrays), stack-allocated structures are equally difficult to recognize based on stack layout alone. Modifying the preceding program to use a stack-allocated structure, declared in main
, yields the following disassembly:
.text:00401000 _main proc near .text:00401000 .text:00401000 var_18 = dword ptr −18h .text:00401000 var_14 = word ptr −14h .text:00401000 var_12 = byte ptr −12h .text:00401000 var_10 = dword ptr −10h .text:00401000 var_8 = qword ptr −8 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 18h .text:00401006 mov [ebp+var_18], 10 .text:0040100D mov [ebp+var_14], 20 .text:00401013 mov [ebp+var_12], 30 .text:00401017 mov [ebp+var_10], 40 .text:0040101E fld ds:dbl_40B128 .text:00401024 fstp [ebp+var_8] .text:00401027 xor eax, eax .text:00401029 mov esp, ebp .text:0040102B pop ebp .text:0040102C retn .text:0040102C _main endp
Again, no math is performed to access the structure’s fields since the compiler can determine the relative offsets for each field within the stack frame at compile time. In this case, we are left with the same, potentially misleading picture that five individual variables are being used rather than a single variable that happens to contain five distinct fields. In reality, var_18
should be the start of a 24-byte structure, and each of the other variables should somehow be formatted to reflect the fact that they are fields within the structure.
Heap-allocated structures turn out to be much more revealing regarding the size of the structure and the layout of its fields. When a structure is allocated in the program heap, the compiler has no choice but to generate code to compute the proper offset into the structure whenever a field is accessed. This is a result of the structure’s address being unknown at compile time. For globally allocated structures, the compiler is able to compute a fixed starting address. For stack-allocated structures, the compiler can compute a fixed relationship between the start of the structure and the frame pointer for the enclosing stack frame. When a structure has been allocated in the heap, the only reference to the structure available to the compiler is the pointer to the structure’s starting address.
Modifying our structure example once again to make use of a heap-allocated structure results in the following disassembly. Similar to the heap-allocated array example from page 134, we declare a pointer within main
and assign it the address of a block of memory large enough to hold our structure:
.text:00401000 _main proc near .text:00401000 .text:00401000 heap_struct = dword ptr −4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push ecx .text:00401004 push 24 ; size_t .text:00401006 call _malloc .text:0040100B add esp, 4 .text:0040100E mov [ebp+heap_struct], eax .text:00401011 mov eax, [ebp+heap_struct] .text:00401014 mov dword ptr [eax], 10 .text:0040101A mov ecx, [ebp+heap_struct] .text:0040101D mov word ptr [ecx+4], 20 .text:00401023 mov edx, [ebp+heap_struct] .text:00401026 mov byte ptr [edx+6], 30 .text:0040102A mov eax, [ebp+heap_struct] .text:0040102D mov dword ptr [eax+8], 40 .text:00401034 mov ecx, [ebp+heap_struct] .text:00401037 fld ds:dbl_40B128 .text:0040103D fstp qword ptr [ecx+10h] .text:00401040 xor eax, eax .text:00401042 mov esp, ebp .text:00401044 pop ebp .text:00401045 retn .text:00401045 _main endp
In this example, unlike the global and stack-allocated structure examples, we are able to discern the exact size and layout of the structure. The structure size can be inferred to be 24 bytes based on the amount of memory requested from malloc
. The structure contains the following fields at the indicated offsets:
Based on the use of floating point instructions, we can further deduce that the qword
field is actually a double
. The same program compiled to pack structures with a 1-byte alignment yields the following disassembly:
.text:00401000 _main proc near .text:00401000 .text:00401000 heap_struct = dword ptr −4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 push ecx .text:00401004 push 19 ; size_t .text:00401006 call _malloc .text:0040100B add esp, 4 .text:0040100E mov [ebp+heap_struct], eax .text:00401011 mov eax, [ebp+heap_struct] .text:00401014 mov dword ptr [eax], 10 .text:0040101A mov ecx, [ebp+heap_struct] .text:0040101D mov word ptr [ecx+4], 20 .text:00401023 mov edx, [ebp+heap_struct] .text:00401026 mov byte ptr [edx+6], 30 .text:0040102A mov eax, [ebp+heap_struct] .text:0040102D mov dword ptr [eax+7], 40 .text:00401034 mov ecx, [ebp+heap_struct] .text:00401037 fld ds:dbl_40B128 .text:0040103D fstp qword ptr [ecx+0Bh] .text:00401040 xor eax, eax .text:00401042 mov esp, ebp .text:00401044 pop ebp .text:00401045 retn .text:00401045 _main endp
The only changes to the program are the smaller size of the structure (now 19 bytes) and the adjusted offsets to account for the realignment of each structure field.
Regardless of the alignment used when compiling a program, finding structures allocated and manipulated in the program heap is the fastest way to determine the size and layout of a given data structure. However, keep in mind that many functions will not do you the favor of immediately accessing every member of a structure to help you understand the structure’s layout. Instead, you may need to follow the use of the pointer to the structure and make note of the offsets used whenever that pointer is dereferenced. In this manner, you will eventually be able to piece together the complete layout of the structure.
Some programmers would say that the beauty of composite data structures is that they allow you to build arbitrarily complex structures by nesting smaller structures within larger structures. Among other possibilities, this capability allows for arrays of structures, structures within structures, and structures that contain arrays as members. The preceding discussions regarding arrays and structures apply just as well when dealing with nested types such as these. As an example, consider an array of structures like the following simple program in which heap_struct
points to an array of five ch8_struct
items:
int main() { int idx = 1; struct ch8_struct *heap_struct; heap_struct = (struct ch8_struct*)malloc(sizeof(struct ch8_struct) * 5); heap_struct[idx].field1 = 10; }
The operations required to access field1
at include multiplying the index value by the size of an array element, in this case the size of the structure, and then adding the offset to the desired field. The corresponding disassembly is shown here:
.text:00401000 _main proc near .text:00401000 .text:00401000 idx = dword ptr −8 .text:00401000 heap_struct = dword ptr −4 .text:00401000 .text:00401000 push ebp .text:00401001 mov ebp, esp .text:00401003 sub esp, 8 .text:00401006 mov [ebp+idx], 1 .text:0040100D push 120 ; size_t .text:0040100F call _malloc .text:00401014 add esp, 4 .text:00401017 mov [ebp+heap_struct], eax .text:0040101A mov eax, [ebp+idx] .text:0040101D imul eax, 24 .text:00401020 mov ecx, [ebp+heap_struct] .text:00401023 mov dword ptr [ecx+eax], 10 .text:0040102A xor eax, eax .text:0040102C mov esp, ebp .text:0040102E pop ebp .text:0040102F retn .text:0040102F _main endp
The disassembly reveals 120 bytes () being requested from the heap. The array index is multiplied by 24 at before being added to the start address for the array at . No additional offset is required in order to generate the final address for the reference at . From these facts we can deduce the size of an array item (24), the number of items in the array (120 / 24 = 5
), and the fact that there is a 4-byte (dword
) field at offset 0 within each array element. This short listing does not offer enough information to draw any conclusions about how the remaining 20 bytes within each structure are allocated to additional fields.
[44] Holding the mouse over any name in the IDA display causes a tool tip–style pop-up window to be displayed that shows up to 10 lines of disassembly at the target location. In the case of library function names, this often includes the prototype for calling the library function.
[45] Please see http://msdn.microsoft.com/library/.
18.191.237.79