9.2 Disassembly Solution

In the preceding code, the first two instructions (push ebp and mov ebp, esp) represent function prologue. Similarly, the two lines before the last instruction, ret, represent the function epilogue (mov esp,ebp and pop ebp). We know that the function prologue and epilogue are not part of the code, but they are used to set up the environment for the function, and hence they can be removed to simplify the code. The third instruction, sub,14h, suggests that 20 (14h) bytes are allocated for local variables; we know that this instruction is also not part of the code (it's just used for allocating space for local variables), and can also be ignored. After removing the instructions that are not part of the actual code, we are left with the following:

1. mov dword ptr [ebp-14h], 1
2. mov dword ptr [ebp-10h], 2 ➐
3. mov dword ptr [ebp-0Ch], 3 ➑
4. mov dword ptr [ebp-4], 0 ➍

loc_401022: ➋
5. cmp dword ptr [ebp-4], 3 ➌
6. jge loc_40103D ➌
7. mov eax, [ebp-4]
8. mov ecx, [ebp+eax*4-14h] ➏
9. mov [ebp-8], ecx
10. mov edx, [ebp-4] ➎
11. add edx, 1 ➎
12. mov [ebp-4], edx ➎
13. jmp loc_401022 ➊

loc_40103D:
14. xor eax, eax
15. ret

The backward jump at ➊, to loc_401022, indicates the loop, and the code between ➊ and ➋ is the part of the loop. Let's identify the loop variable, the loop initialization, the condition check, and the update statement. The two instructions at ➌ is a condition check that is checking whether the value of [ebp-4] is greater than or equal to 3; when this condition is met, a jump is taken outside of the loop. The same variable, [ebp-4], is initialized to 0 at ➍ before the condition check at ➌, and the variable is incremented using the instructions at ➎. All of these details suggest that ebp-4 is the loop variable, so we can rename ebp-4 as i (ebp-4=i).

At ➏, the instruction [ebp+eax*4-14h] represents array access. Let's try to identify the components of the array (the base address, index, and the size of each element). We know that local variables (including elements of an array) are accessed as ebp-<somevalue> (in other words, the negative offset from ebp), so we can rewrite [ebp+eax*4-14h] as [ebp-14h+eax*4]. Here, ebp-14h represents the base address of the array on the stack, eax represents the index, and 4 is the size of each element of the array. Since ebp-14h is the base address, which means this address also represents the first element of the array, if we assume the array name is val, then ebp-14h = val[0].

Now that we have determined the first element of the array, let's try to find the other elements. From the array notation,  in this case, we know that the size of each element is 4 bytes. So, if val[0] = ebp-14h, then val[1] should be at the next highest address, which is ebp-10h, and val[2] should be at ebp-0Ch, and so on. Notice that ebp-10h and ebp-0Ch are referenced at ➐ and ➑. Let's rename ebp-10h as val[1] and ebp-14h as val[2]. We still haven't figured out how many elements this array contains. First, let's replace all of the determined values and write the preceding code in a high-level language equivalent. The last two instructions, xor eax,eax and retcan be written as return 0, so the pseudocode now looks as follows:

val[0] = 1
val[1] = 2
val[2] = 3
i = 0
while (i<3)
{
eax = i
ecx = [val+eax*4] ➒
[ebp-8] = ecx ➒
edx = i
edx = edx + 1 ➒
i = edx ➒
}
return 0

Replacing all of the register names on the right-hand side of the = operator at ➒ with their corresponding values, we will get the following code:

val[0] = 1
val[1] = 2
val[2] = 3
i = 0
while (i<3)
{
eax = i ➓
ecx = [val+i*4] ➓
[ebp-8] = [val+i*4]
edx = i ➓
edx = i + 1 ➓
i = i + 1
}
return 0

Removing all of the entries containing register names on the left-hand side of the = operator at ➓, we get the following code:

val[0] = 1
val[1] = 2
val[2] = 3
i = 0
while (i<3)
{
[ebp-8] = [val+i*4]
i = i + 1
}
return 0

From what we learned previously, when we access an element of the integer array using nums[0], it is the same as [nums+0*4], and nums[1] is the same as [nums+1*4], which means that the general form of nums[i] can be represented as [nums+i*4] that is, nums[i] = [nums+i*4]. Going by that logic, we can replace [val+i*4] with val[i] in the preceding code.

Now, we are left with the address ebp-8 in the preceding code; this could be a local variable, or it could be the fourth element in the array val[3] (it's really hard to say). If we assume it as a local variable and rename ebp-8 as x (ebp-8=x), then the resultant code will look as shown below. From the following code, we can tell that the code probably iterates through each element of the array (using the index variable i) and assigns the value to the variable x. From the code, we can gather one extra piece of information: if the index i was used for iterating through each element of the array, then we can guess that the array probably has three elements (because the index i takes a maximum value of 2 before exiting the loop):

val[0] = 1
val[1] = 2
val[2] = 3
i = 0
while (i<3)
{
x = val[i]
i = i + 1
}
return 0

Instead of treating ebp-8 as the local variable x, if you treat ebp-8 as the array's fourth element (ebp-8 = val[3]), then the code will be translated to the following. Now, the code can be interpreted differently, that is, the array now has four elements and the code iterates through the first three elements. In every iteration, the value is assigned to the fourth element:

val[0] = 1
val[1] = 2
val[2] = 3
i = 0
while (i<3)
{
val[3] = val[i]
i = i + 1
}
return 0

As you might have guessed from the preceding example, it is not always possible to decompile the assembly code to its original form accurately, because of the way the compiler generates code (and also, the code might not have all of the required information). However, this technique should help to determine the program's functionality. The original C program of the disassembled output is shown as follows; notice the similarities between what we determined previously and the original code here:

int main()
{
int a[3] = { 1, 2, 3 };
int b, i;
i = 0;
while (i < 3)
{
b = a[i];
i++;
}
return 0;
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.160.216