Enhancing Disassembly

One of IDA Pro’s best features is that it allows you to modify its disassembly to suit your goals. The changes that you make can greatly increase the speed with which you can analyze a binary.

Warning

IDA Pro has no undo feature, so be careful when you make changes.

Renaming Locations

IDA Pro does a good job of automatically naming virtual address and stack variables, but you can also modify these names to make them more meaningful. Auto-generated names (also known as dummy names) such as sub_401000 don’t tell you much; a function named ReverseBackdoorThread would be a lot more useful. You should rename these dummy names to something more meaningful. This will also help ensure that you reverse-engineer a function only once. When renaming dummy names, you need to do so in only one place. IDA Pro will propagate the new name wherever that item is referenced.

After you’ve renamed a dummy name to something more meaningful, cross-references will become much easier to parse. For example, if a function sub_401200 is called many times throughout a program and you rename it to DNSrequest, it will be renamed DNSrequest throughout the program. Imagine how much time this will save you during analysis, when you can read the meaningful name instead of needing to reverse the function again or to remember what sub_401200 does.

Table 5-2 shows an example of how we might rename local variables and arguments. The left column contains an assembly listing with no arguments renamed, and the right column shows the listing with the arguments renamed. We can actually glean some information from the column on the right. Here, we have renamed arg_4 to port_str and var_598 to port. You can see that these renamed elements are much more meaningful than their dummy names.

Comments

IDA Pro lets you embed comments throughout your disassembly and adds many comments automatically.

To add your own comments, place the cursor on a line of disassembly and press the colon (:) key on your keyboard to bring up a comment window. To insert a repeatable comment to be echoed across the disassembly window whenever there is a cross-reference to the address in which you added the comment, press the semicolon (;) key.

Formatting Operands

When disassembling, IDA Pro makes decisions regarding how to format operands for each instruction that it disassembles. Unless there is context, the data displayed is typically formatted as hex values. IDA Pro allows you to change this data if needed to make it more understandable.

Table 5-2. Function Operand Manipulation

Without renamed arguments

With renamed arguments

004013C8  mov   eax, [ebp+arg_4]
004013CB  push  eax
004013CC  call  _atoi
004013D1  add   esp, 4
004013D4  mov [ebp+var_598], ax
004013DB  movzx ecx, [ebp+var_598]
004013E2  test  ecx, ecx
004013E4  jnz   short loc_4013F8
004013E6  push  offset aError
004013EB  call  printf
004013F0  add   esp, 4
004013F3  jmp   loc_4016FB
004013F8 ; ----------------------
004013F8
004013F8 loc_4013F8:
004013F8  movzx edx, [ebp+var_598]
004013FF  push  edx
00401400  call  ds:htons
004013C8  mov   eax, [ebp+port_str]
004013CB  push  eax
004013CC  call  _atoi
004013D1  add   esp, 4
004013D4  mov   [ebp+port], ax
004013DB  movzx ecx, [ebp+port]
004013E2  test  ecx, ecx
004013E4  jnz   short loc_4013F8
004013E6  push  offset aError
004013EB  call  printf
004013F0  add   esp, 4
004013F3  jmp   loc_4016FB
004013F8 ; --------------------
004013F8
004013F8 loc_4013F8:
004013F8  movzx edx, [ebp+port]
004013FF  push  edx
00401400  call  ds:htons

Figure 5-10 shows an example of modifying operands in an instruction, where 62h is compared to the local variable var_4. If you were to right-click 62h, you would be presented with options to change the 62h into 98 in decimal, 142o in octal, 1100010b in binary, or the character b in ASCII—whatever suits your needs and your situation.

Function operand manipulation

Figure 5-10. Function operand manipulation

To change whether an operand references memory or stays as data, press the O key on your keyboard. For example, suppose when you’re analyzing disassembly with a link to loc_410000, you trace the link back and see the following instructions:

mov eax, loc_410000
add ebx, eax
mul ebx

At the assembly level, everything is a number, but IDA Pro has mislabeled the number 4259840 (0x410000 in hex) as a reference to the address 410000. To correct this mistake, press the O key to change this address to the number 410000h and remove the offending cross-reference from the disassembly window.

Using Named Constants

Malware authors (and programmers in general) often use named constants such as GENERIC_READ in their source code. Named constants provide an easily remembered name for the programmer, but they are implemented as an integer in the binary. Unfortunately, once the compiler is done with the source code, it is no longer possible to determine whether the source used a symbolic constant or a literal.

Fortunately, IDA Pro provides a large catalog of named constants for the Windows API and the C standard library, and you can use the Use Standard Symbolic Constant option (shown in Figure 5-10) on an operand in your disassembly. Figure 5-11 shows the window that appears when you select Use Standard Symbolic Constant on the value 0x800000000.

Standard symbolic constant window

Figure 5-11. Standard symbolic constant window

The code snippets in Table 5-3 show the effect of applying the standard symbolic constants for a Windows API call to CreateFileA. Note how much more meaningful the code is on the right.

Note

To determine which value to choose from the often extensive list provided in the standard symbolic constant window, you will need to go to the MSDN page for the Windows API call. There you will see the symbolic constants that are associated with each parameter. We will discuss this further in Chapter 7, when we discuss Windows concepts.

Sometimes a particular standard symbolic constant that you want will not appear, and you will need to load the relevant type library manually. To do so, select View ▶ Open Subviews ▶ Type Libraries to view the currently loaded libraries. Normally, mssdk and vc6win will automatically be loaded, but if not, you can load them manually (as is often necessary with malware that uses the Native API, the Windows NT family API). To get the symbolic constants for the Native API, load ntapi (the Microsoft Windows NT 4.0 Native API). In the same vein, when analyzing a Linux binary, you may need to manually load the gnuunx (GNU C++ UNIX) libraries.

Table 5-3. Code Before and After Standard Symbolic Constants

Before symbolic constants

After symbolic constants

mov     esi, [esp+1Ch+argv]
mov     edx, [esi+4]
mov     edi, ds:CreateFileA
push    0    ; hTemplateFile
push    80h  ; dwFlagsAndAttributes
push    3    ; dwCreationDisposition
push    0    ; lpSecurityAttributes
push    1    ; dwShareMode
push    80000000h ; dwDesiredAccess
push    edx ;  lpFileName
call    edi ; CreateFileA
mov     esi, [esp+1Ch+argv]
mov     edx, [esi+4]
mov     edi, ds:CreateFileA
push    NULL  ; hTemplateFile
push    FILE_ATTRIBUTE_NORMAL ; dwFlagsAndAttributes
push    OPEN_EXISTING         ; dwCreationDisposition
push    NULL                  ; lpSecurityAttributes
push    FILE_SHARE_READ       ; dwShareMode
push    GENERIC_READ          ; dwDesiredAccess
push    edx ; lpFileName
call    edi ; CreateFileA

Redefining Code and Data

When IDA Pro performs its initial disassembly of a program, bytes are occasionally categorized incorrectly; code may be defined as data, data defined as code, and so on. The most common way to redefine code in the disassembly window is to press the U key to undefine functions, code, or data. When you undefine code, the underlying bytes will be reformatted as a list of raw bytes.

To define the raw bytes as code, press C. For example, Table 5-4 shows a malicious PDF document named paycuts.pdf. At offset 0x8387 into the file, we discover shellcode (defined as raw bytes) at , so we press C at that location. This disassembles the shellcode and allows us to discover that it contains an XOR decoding loop with 0x97 at .

Depending on your goals, you can similarly define raw bytes as data or ASCII strings by pressing D or A, respectively.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.216.229