8
STATIC ANALYSIS OF A BOOTKIT USING IDA PRO

Image

This chapter introduces the basic concepts of bootkit static analysis with IDA Pro. There are several ways to approach reversing bootkits, and covering all the existing approaches would require a book of its own. We focus on the IDA Pro disassembler, because it provides unique features that enable the static analysis of bootkits.

Statically analyzing bootkits is radically different from reverse engineering in most conventional application environments, because crucial parts of a bootkit execute in a preboot environment. For example, a typical Windows application relies on standard Windows libraries and is expected to call standard library functions known to reverse-engineering tools like Hex-Rays IDA Pro. We can deduce a lot about an application by the functions it calls; the same is true about Linux applications versus POSIX system calls. But the preboot environment lacks these hints, so the tools for preboot analysis need additional features to compensate for this missing information. Fortunately, these features are available in IDA Pro, and this chapter explains how to use them.

As discussed in Chapter 7, a bootkit consists of several closely connected modules: the Master Boot Record (MBR) or Volume Boot Record (VBR) infector, a malicious boot loader, and kernel-mode drivers, among others. We’ll restrict the discussion in this chapter to the analysis of a bootkit MBR and a legitimate operating system VBR, which you can use as a model for reversing any code that executes in the preboot environment. You can download the MBR and VBR you’ll use here from the book’s downloadable resources. At the end of the chapter, we discuss how to deal with other bootkit components, such as the malicious boot loader and kernel-mode drivers. If you haven’t already worked through Chapter 7, you should do so now.

First, we’ll show you how to get started with bootkit analysis; you’ll learn which options to use in IDA Pro in order to load the code into the disassembler, the API used in the preboot environment, how control is transferred between different modules, and which IDA features may simplify their reversal. Then you’ll learn how to develop a custom loader for IDA Pro in order to automate your reversing tasks. Finally, we provide a set of exercises designed to help you further explore bootkit static analysis. You can download the materials for this chapter from https://nostarch.com/rootkits/.

Analyzing the Bootkit MBR

First, we’ll analyze a bootkit MBR in the IDA Pro disassembler. The MBR we use in this chapter is similar to the one the TDL4 bootkit creates (see Chapter 7). The TDL4 MBR is a good example because it implements traditional bootkit functionality, but its code is easy to disassemble and understand. We based the VBR example in this chapter on legitimate code from an actual Microsoft Windows volume.

Loading and Decrypting the MBR

In the following sections, you’ll load the MBR into IDA Pro and analyze the MBR code at its entry point. Then, you’ll decrypt the code and examine how the MBR manages memory.

Loading the MBR into IDA Pro

The first step in the static analysis of the bootkit MBR is to load the MBR code into IDA. Because the MBR isn’t a conventional executable and has no dedicated loader, you need to load it as a binary module. IDA Pro will simply load the MBR into its memory as a single contiguous segment just as the BIOS does, without performing any extra processing. You only need to provide the starting memory address for this segment.

Load the binary file by opening it via IDA Pro. When IDA Pro first loads the MBR, it displays a message offering various options, as shown in Figure 8-1.

image

Figure 8-1: The IDA Pro dialog displayed when loading the MBR

You can accept the defaults for most of the parameters, but you need to enter a value into the Loading offset field , which specifies where in memory to load the module. This value should always be 0x7C00—the fixed address where the MBR is loaded by the BIOS boot code. Once you’ve entered this offset, click OK. IDA Pro loads the module, then gives you the option to disassemble the module either in 16-bit or 32-bit mode, as shown in Figure 8-2.

image

Figure 8-2: IDA Pro dialog asking you which disassembly mode to choose

For this example, choose No. This directs IDA to disassemble the MBR as 16-bit real-mode code, which is the way the actual CPU decodes it at the very beginning of the boot process.

Because IDA Pro stores the results of disassembly in a database file with the extension idb, we’ll refer to the results of its disassembly as a database from now on. IDA uses this database to collect all of the code annotations you provide through your GUI actions and IDA scripts. You can think of the database as the implicit argument to all IDA script functions, which represents the current state of your hard-won reverse-engineering knowledge about the binary on which IDA can act.

If you don’t have any experience with databases, don’t worry: IDA’s interfaces are designed so that you don’t need to know the database internals. Understanding how IDA represents what it learns about code, however, does help a lot.

Analyzing the MBR’s Entry Point

When loaded by the BIOS at boot, the MBR—now modified by the infecting bootkit—is executed from its first byte. We specified its loading address to IDA’s disassembler as 0:7C00h, which is where the BIOS loads it. Listing 8-1 shows the first few bytes of the loaded MBR image.

seg000:7C00 ; Segment type: Pure code
seg000:7C00 seg000          segment byte public 'CODE' use16
seg000:7C00                 assume cs:seg000
seg000:7C00                 ;org 7C00h
seg000:7C00                 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, gs:nothing
seg000:7C00                 xor     ax, ax
seg000:7C02                mov     ss, ax
seg000:7C04                 mov     sp, 7C00h
seg000:7C07                 mov     es, ax
seg000:7C09                 mov     ds, ax
seg000:7C0B                 sti
seg000:7C0C                 pusha
seg000:7C0D                 mov     cx, 0CFh
seg000:7C10                 mov     bp, 7C19h
seg000:7C13
seg000:7C13 loc_7C13:                               ; CODE XREF: seg000:7C17
seg000:7C13                ror     byte ptr [bp+0], cl
seg000:7C16                 inc     bp
seg000:7C17                 loop    loc_7C13
seg000:7C17 ; ---------------------------------------------------------------------------
seg000:7C19 encrypted_code  db 44h, 85h, 1Dh, 0C7h, 1Ch, 0B8h, 26h, 4, 8, 68h, 62h
seg000:7C19                db 40h, 0Eh, 83h, 0Ch, 0A3h, 0B1h, 1Fh, 96h, 84h, 0F5h

Listing 8-1: Entry point of the MBR

Early on we see the initialization stub that sets up the stack segment selector ss, stack pointer sp, and segment selector registers es and ds in order to access memory and execute subroutines. Following the initialization stub is a decryption routine , which deciphers the rest of the MBR by rotating the bits—byte by byte—with an ror instruction, then passes control to the decrypted code. The size of the encrypted blob is given in the cx register, and the bp register points to the blob. This ad hoc encryption is intended to hamper static analysis and avoid detection by security software. It also presents us with our first obstacle, because we now need to extract the actual code to proceed with the analysis.

Decrypting the MBR Code

To continue our analysis of an encrypted MBR, we need to decrypt the code. Thanks to the IDA scripting engine, you can easily accomplish this task with the Python script in Listing 8-2.

import idaapi
   # beginning of the encrypted code and its size in memory
   start_ea = 0x7C19
   encr_size = 0xCF

for ix in xrange(encr_size):
   byte_to_decr = idaapi.get_byte(start_ea + ix)
     to_rotate = (0xCF - ix) % 8
     byte_decr = (byte_to_decr >> to_rotate) | (byte_to_decr << (8 - to_rotate))
   idaapi.patch_byte(start_ea + ix, byte_decr)

Listing 8-2: Python script to decrypt the MBR code

First, we import the idaapi package , which contains the IDA API library. Then we loop through and decrypt the encrypted bytes . To fetch a byte from the disassembly segment, we use the get_byte API , which takes the address of the byte to read as its only parameter. Once it’s decrypted, we write the byte back to the disassembly region using the patch_byte API, which takes the address of the byte to modify and the value to write there. You can execute the script by choosing FileScript from the IDA menu or by pressing ALT-F7.

NOTE

This script doesn’t modify the actual image of the MBR but rather its representation in IDA—that is, IDA’s idea of what the loaded code will look when it’s ready to run. Before making any modifications to the disassembled code, you should create a backup of the current version of the IDA database. That way, if the script modifying the MBR code contains bugs and distorts the code, you’ll be able to easily recover its most recent version.

Analyzing Memory Management in Real Mode

Having decrypted the code, let’s proceed with analyzing it. If you look through the decrypted code, you’ll find the instructions shown in Listing 8-3. These instructions initialize the malicious code by storing the MBR input parameters and memory allocation.

seg000:7C19                mov     ds:drive_no, dl
seg000:7C1D                sub     word ptr ds:413h, 10h
seg000:7C22                 mov     ax, ds:413h
seg000:7C25                 shl     ax, 6
seg000:7C28                mov     ds:buffer_segm, ax

Listing 8-3: Memory allocation in the preboot environment

The assembly instruction that stores the contents of the dl register into memory is at an offset from the ds segment . From our experience analyzing this kind of code, we can guess that the dl register contains the number of the hard drive from which the MBR is being executed; annotate this offset as a variable called drive_no. IDA Pro records this annotation in the database and shows it in the listing. When performing I/O operations, you can use this integer index to distinguish between different disks available to the system. You’ll use this variable in the BIOS disk service in the next section.

Similarly, Listing 8-3 shows the annotation buffer_segm for the offset where the code allocates a buffer. IDA Pro helpfully propagates these annotations to other code that uses the same variables.

At , we see a memory allocation. In the preboot environment, there is no memory manager in the sense of modern operating systems, such as the OS logic backing malloc() calls. Instead, the BIOS maintains the number of kilobytes of available memory in a word—a 16-bit value in x86 architecture—located at the address 0:413h. In order to allocate X KB of memory, we subtract X from the total size of available memory, a value stored in the word at 0:413h, as shown in Figure 8-3.

image

Figure 8-3: Memory management in a preboot environment

In Listing 8-3, the code allocates a buffer of 10Kb by subtracting 10h from the total amount available. The actual address is stored in the variable buffer_segm . The MBR then uses the allocated buffer to store read data from the hard drive.

Analyzing the BIOS Disk Service

Another unique aspect of the preboot environment is the BIOS disk service, an API used to communicate with a hard drive. This API is particularly interesting in the context of bootkit analysis for two reasons. First, bootkits use it to read data from the hard drive, so it’s important to be familiar with the API’s most frequently used commands in order to understand bootkit code. Also, this API is itself a frequent target of bootkits. In the most common scenario, a bootkit hooks the API to patch legitimate modules that are read from the hard drive by other code during the boot process.

The BIOS disk service is accessible via an INT 13h instruction. In order to perform I/O operations, software passes I/O parameters through the processor registers and executes the INT 13h instruction, which transfers control to the appropriate handler. The I/O operation code, or identifier, is passed in the ah register—the higher-order part of the ax register. The register dl is used to pass the index of the disk in question. The processor’s carry flag (CF) is used to indicate whether an error has occurred during execution of the service: if CF is set to 1, an error has occurred and the detailed error code is returned in the ah register. This BIOS convention for passing arguments to a function predates the modern OS system call conventions; if it seems convoluted to you, remember that this is where the idea of uniform system call interfaces originated.

This INT 13h interrupt is an entry point to the BIOS disk service, and it allows software in the preboot environment to perform basic I/O operations on disk devices, like hard drives, floppy drives, and CD-ROMs, as shown in Table 8-1.

Table 8-1: The INT 13h Commands

Operation code

Operation description

2h

Read sectors into memory

3h

Write disk sectors

8h

Get drive parameters

41h

Extensions installation check

42h

Extended read

43h

Extended write

48h

Extended get drive parameters

The operations in Table 8-1 are split into two groups: the first group (with codes 41h, 42h, 43h, and 48h) comprises the extended operations, and the second group (with codes 2h, 3h, and 8h) consists of the legacy operations.

The only difference between the groups is that the extended operations can use an addressing scheme based on logical block addressing (LBA), whereas the legacy operations rely solely on a legacy Cylinder Head Sector (CHS)–based addressing scheme. In the case of the LBA-based scheme, sectors are enumerated linearly on the disk, beginning with index 0, whereas in the CHS-based scheme, each sector is addressed using the tuple (c,h,s), where c is the cylinder number, h is the head number, and s is the number of the sector. Although bootkits may use either group, almost all modern hardware supports the LBA-based addressing scheme.

Obtaining Drive Parameters to Locate Hidden Storage

As you continue looking at the MBR code that follows the 10KB memory allocation, you should see the execution of an INT 13h instruction, as shown in Listing 8-4.

seg000:7C2B                mov     ah, 48h
seg000:7C2D                mov     si, 7CF9h
seg000:7C30                 mov     ds:drive_param.bResultSize, 1Eh
seg000:7C36                 int     13h         ; DISK - IBM/MS Extension
                                              ; GET DRIVE PARAMETERS
                                                ; (DL - drive, DS:SI - buffer)

Listing 8-4: Obtaining drive parameters via the BIOS disk service

The small size of the MBR (512 bytes) restricts the functionality of the code that can be implemented within it. For this reason, the bootkit loads additional code to execute, called a malicious boot loader, which is placed in hidden storage at the end of the hard drive. To obtain the coordinates of the hidden storage on the disk, the MBR code uses the extended “get drive parameters” operation (operation code 48h in Table 8-1), which returns information about the hard drive’s size and geometry. This information allows the bootkit to compute the offset at which the additional code is located on the hard drive.

In Listing 8-4, you can see an automatically generated comment from IDA Pro for the instruction INT 13h . During code analysis, IDA Pro identifies parameters passed to the BIOS disk service handler call and generates a comment with the name of the requested disk I/O operation and the register names used to pass parameters to the BIOS handler. This MBR code executes INT 13h with parameter 48h . Upon execution, this routine fills a special structure called EXTENDED_GET_PARAMS that provides the drive parameters. The address of this structure is stored in the si register .

Examining EXTENDED_GET_PARAMS

The EXTENDED_GET_PARAMS routing is provided in Listing 8-5.

typedef struct _EXTENDED_GET_PARAMS {
   WORD bResultSize;             // Size of the result
   WORD InfoFlags;               // Information flags
   DWORD CylNumber;              // Number of physical cylinders on drive
   DWORD HeadNumber;             // Number of physical heads on drive
   DWORD SectorsPerTrack;        // Number of sectors per track
QWORD TotalSectors;           // Total number of sectors on drive
WORD BytesPerSector;          // Bytes per sector
} EXTENDED_GET_PARAMS, *PEXTENDED_GET_PARAMS;

Listing 8-5: The EXTENDED_GET_PARAMS structure layout

The only fields the bootkit actually looks at in the returned structure are the number of sectors on the hard drive and the size of the disk sector in bytes . The bootkit computes the total size of the hard drive in bytes by multiplying these two values, then uses the result to locate the hidden storage at the end of the drive.

Reading Malicious Boot Loader Sectors

Once the bootkit has obtained the hard drive parameters and calculated the offset of the hidden storage, the bootkit MBR code reads this hidden data from the disk with the extended read operation of the BIOS disk service. This data is the next-stage malicious boot loader intended to bypass OS security checks and load a malicious kernel-mode driver. Listing 8-6 shows the code that reads it into RAM.

seg000:7C4C read_loop:                              ; CODE XREF: seg000:7C5Dj
seg000:7C4C               call    read_sector
seg000:7C4F                 mov     si, 7D1Dh
seg000:7C52                 mov     cx, ds:word_7D1B
seg000:7C56                 rep movsb
seg000:7C58                 mov     ax, ds:word_7D19
seg000:7C5B                 test    ax, ax
seg000:7C5D                 jnz     short read_loop
seg000:7C5F                 popa
seg000:7C60               jmp     far boot_loader

Listing 8-6: Code for loading an additional malicious boot loader from the disk

In the read_loop, this code repeatedly reads sectors from the hard drive using the routine read_sector and stores them in the previously allocated memory buffer. Then the code transfers control to this malicious boot loader by executing a jmp far instruction .

Looking at the code of the read_sector routine, in Listing 8-7 you can see the usage of INT 13h with the parameter 42h, which corresponds to the extended read operation.

seg000:7C65 read_sector     proc near
seg000:7C65                 pusha
seg000:7C66                mov     ds:disk_address_packet.PacketSize, 10h
seg000:7C6B                mov     byte ptr ds:disk_address_packet.SectorsToTransfer, 1
seg000:7C70                 push    cs
seg000:7C71                 pop     word ptr ds:disk_address_packet.TargetBuffer+2
seg000:7C75                mov     word ptr ds:disk_address_packet.TargetBuffer, 7D17h
seg000:7C7B                 push    large [dword ptr ds:drive_param.TotalSectors_l]
seg000:7C80                pop     large [ds:disk_address_packet.StartLBA_l]
seg000:7C85                 push    large [dword ptr ds:drive_param.TotalSectors_h]
seg000:7C8A                pop     large [ds:disk_address_packet.StartLBA_h]
seg000:7C8F                 inc     eax
seg000:7C91                 sub     ds:disk_address_packet.StartLBA_l, eax
seg000:7C96                 sbb     ds:disk_address_packet.StartLBA_h, 0
seg000:7C9C                 mov     ah, 42h
seg000:7C9E                mov     si, 7CE9h
seg000:7CA1                 mov     dl, ds:drive_no
seg000:7CA5                int     13h             ; DISK - IBM/MS Extension
                                                    ; EXTENDED READ
                                                    ; (DL - drive, DS:SI - disk address packet)
seg000:7CA7                 popa
seg000:7CA8                 retn
seg000:7CA8 read_sector     endp

Listing 8-7: Reading sectors from the disk

Before executing INT 13h , the bootkit code initializes the DISK_ADDRESS_PACKET structure with the proper parameters, including the size of the structure , the number of sectors to transfer , the address of the buffer to store the result , and the addresses of the sectors to read . This structure’s address is provided to the INT 13h handler via the ds and si registers . Note the manual annotation of the structure’s offsets; IDA picks them up and propagates them. The BIOS disk service uses DISK_ADDRESS_PACKET to uniquely identify which sectors to read from the hard drive. The complete layout of the structure of DISK_ADDRESS_PACKET, with comments, is provided in Listing 8-8.

typedef struct _DISK_ADDRESS_PACKET {
   BYTE PacketSize;                 // Size of the structure
   BYTE Reserved;
   WORD SectorsToTransfer;          // Number of sectors to read/write
   DWORD TargetBuffer;              // segment:offset of the data buffer
   QWORD StartLBA;                  // LBA address of the starting sector
} DISK_ADDRESS_PACKET, *PDISK_ADDRESS_PACKET;

Listing 8-8: The DISK_ADDRESS_PACKET structure layout

Once the boot loader is read into the memory buffer, the bootkit executes it.

At this point, we have finished our the analysis of the MBR code and we’ll proceed to dissecting another essential part of the MBR: the partition table. You can download the complete version of the disassembled and commented malicious MBR at https://nostarch.com/rootkits/.

Analyzing the Infected MBR’s Partition Table

The MBR partition table is a common target of bootkits because the data it contains—although limited—plays a crucial part in the boot process’s logic. Introduced in Chapter 5, the partition table is located at the offset 0x1BE in the MBR and consists of four entries, each 0x10 bytes in size. It lists the partitions available on the hard drive, describes their type and location, and specifies where the MBR code should transfer control when it’s done. Usually, the sole purpose of legitimate MBR code is to scan this table for the active partition—that is, the partition marked with the appropriate bit flag and containing the VBR—and load it. You might be able to intercept this execution flow at the very early boot stage by simply manipulating the information contained in the table, without modifying the MBR code itself; the Olmasco bootkit, which we’ll discuss in Chapter 10, implements this method.

This illustrates an important principle of bootkit and rootkit design: if you can manipulate some data surreptitiously enough to bend the control flow, then that approach is preferred to patching the code. This saves the malware programmer the effort of testing new, altered code—a good example of code reuse promoting reliability!

Complex data structures like an MBR or VBR notoriously afford attackers many opportunities to treat them as a kind of bytecode and to treat the native code that consumes the data as a virtual machine programmed through the input data. The language-theoretic security (LangSec, http://langsec.org/) approach explains why this is the case.

Being able to read and understand the MBR’s partition table is essential for spotting this kind of early bootkit interception. Take a look at the partition table in Figure 8-4, where each 16/10h-byte line is a partition table entry.

image

Figure 8-4: Partition table of the MBR

As you can see, the table has two entries—the top two lines—which implies there are only two partitions on the disk. The first partition entry starts at the address 0x7DBE; its very first byte shows that this partition is active, so the MBR code should load and execute its VBR, which is the first sector of that partition. The byte at offset 0x7DC2 describes the type of the partition—that is, the particular filesystem type that should be expected there by the OS, by the bootloader itself, or by other low-level disk access code. In this case, 0x07 corresponds to Microsoft’s NTFS. (For more information on partition types, see “The Windows Boot Process” on page 60.)

Next, the DWORD at 0x7DC5 in the partition table entry indicates that the partition starts at offset 0x800 from the beginning of the hard drive; this offset is counted in sectors. The last DWORD of the entry specifies the partition’s size in sectors (0x32000). Table 8-2 details the particular example in Figure 8-4. In the Beginning offset and Partition size columns, the actual values are provided in sectors, with bytes in parentheses.

Table 8-2: MBR Partition Table Contents

Partition index

Is active

Type

Beginning offset, sectors (bytes)

Partition size, sectors (bytes)

0

True

NTFS (0x07)

0x800 (0x100000)

0x32000 (0x6400000)

1

False

NTFS (0x07)

0x32800 (0x6500000)

0x4FCD000 (0x9F9A00000)

2

N/A

N/A

N/A

N/A

3

N/A

N/A

N/A

N/A

The reconstructed partition table indicates where you should look next in your analysis of the boot sequence. Namely, it tells you where the VBR is. The coordinates of the VBR are stored in the Beginning offset column of the primary partition entry. In this case, the VBR is located at an offset 0x100000 bytes from the beginning of the hard drive, which is the place to look in order to continue your analysis.

VBR Analysis Techniques

In this section, we’ll consider VBR static analysis approaches using IDA and focus on an essential VBR concept called BIOS parameter block (BPB), which plays an important role in the boot process and bootkit infection. The VBR is also a common target of bootkits, as we explained briefly in Chapter 7. In Chapter 12, we’ll discuss the Gapz bootkit, which infects the VBR in order to persist on the infected system, in more detail. The Rovnix bookit, discussed in Chapter 11, also makes use of the VBR to infect a system.

You should load the VBR into the disassembler in essentially the same way you loaded the MBR, since it’s also executed in real mode. Load the VBR file, vbr_sample_ch8.bin, from the samples directory for Chapter 8 as a binary module at 0:7C00h and in 16-bit disassembly mode.

Analyzing the IPL

The main purpose of the VBR is to locate the Initial Program Loader (IPL) and to read it into RAM. The location of the IPL on the hard drive is specified in the BIOS_PARAMETER_BLOCK_NTFS structure, which we discussed in Chapter 5. Stored directly in the VBR, BIOS_PARAMETER_BLOCK_NTFS contains a number of fields that define the geometry of the NTFS volume, such as the number of bytes per sector, the number of sectors per cluster, and the location of the master file table.

The HiddenSectors field, which stores the number of sectors from the beginning of the hard drive to the beginning of the NTFS volume, defines the actual location of the IPL. The VBR assumes that the NTFS volume begins with the VBR, immediately followed by the IPL. So the VBR code loads the IPL by fetching the contents of the HiddenSectors field, incrementing the fetched value by 1, and then reading 0x2000 bytes—which corresponds to 16 sectors—from the calculated offset. Once the IPL is loaded from disk, the VBR code transfers control to it.

Listing 8-9 shows a part of the BIOS parameter block structure in our example.

seg000:000B bpb     dw 200h      ; SectorSize
seg000:000D         db 8         ; SectorsPerCluster
seg000:001E         db 3 dup(0)  ; reserved
seg000:0011         dw 0         ; RootDirectoryIndex
seg000:0013         dw 0         ; NumberOfSectorsFAT
seg000:0015         db 0F8h      ; MediaId
seg000:0016         db 2 dup(0)  ; Reserved2
seg000:0018         dw 3Fh       ; SectorsPerTrack
seg000:001A         dw 0FFh      ; NumberOfHeads
seg000:001C         dd 800h      ; HiddenSectors

Listing 8-9: The BIOS parameter block of the VBR

The value of HiddenSectors is 0x800, which corresponds to the beginning offset of the active partition on the disk in Table 8-2. This shows that the IPL is located at offset 0x801 from the beginning of the disk. Bootkits use this information to intercept control during the boot process. The Gapz bootkit, for example, modifies the contents of the HiddenSectors field so that, instead of a legitimate IPL, the VBR code reads and executes the malicious IPL. Rovnix, on the other hand, uses another strategy: it modifies the legitimate IPL’s code. Both manipulations intercept control at the early boot of the system.

Evaluating Other Bootkit Components

Once the IPL receives control, it loads bootmgr, which is stored in the filesystem of the volume. After this, other bootkit components, such as malicious boot loaders and kernel-mode drivers, may kick in. A full analysis of these modules is beyond the scope of this chapter, but we’ll briefly outline some approaches.

Malicious Boot Loaders

Malicious boot loaders constitute an important part of bootkits. Their main purposes are to survive through the CPU’s execution mode switching, bypass OS security checks (such as driver signature enforcement), and load malicious kernel-mode drivers. They implement functionality that cannot fit in the MBR and the VBR due to their size limitations, and they’re stored separately on the hard drive. Bootkits store their boot loaders in hidden storage areas located either at the end of the hard drive, where there is usually some unused disk space, or in free disk space between partitions, if there is any.

A malicious boot loader may contain different code to be executed in different processor execution modes:

16-bit real mode Interrupt 13h hooking functionality

32-bit protected mode Bypass OS security checks for 32-bit OS version

64-bit protected mode (long mode) Bypass OS security checks for 64-bit OS version

But the IDA Pro disassembler can’t keep code disassembled in different modes in a single IDA database, so you’ll need to maintain different versions of the IDA Pro database for different execution modes.

Kernel-Mode Drivers

In most cases, the kernel-mode drivers that bootkits load are valid PE images. They implement rootkit functionality that allows malware to avoid detection by security software and provides covert communication channels, among other things. Modern bootkits usually contain two versions of the kernel-mode driver, compiled for the x86 and x64 platforms. You may analyze these modules using conventional approaches for static analysis of executable images. IDA Pro does a decent job of loading such executables, and it provides a lot of supplemental tools and information for their analysis. However, we’ll discuss how to instead use IDA Pro’s features to automate the analysis of bootkits by preprocessing them as IDA loads them.

Advanced IDA Pro Usage: Writing a Custom MBR Loader

One of the most striking features of the IDA Pro disassembler is the breadth of its support for various file formats and processor architectures. To achieve this, the functionality for loading particular types of executables is implemented in special modules called loaders. By default, IDA Pro contains a number of loaders, covering the most frequent types of executables, such as PE (Windows), ELF (Linux), Mach-O (macOS), and firmware image formats. You can obtain the list of available loaders by inspecting the contents of your $IDADIRloaders directory, where $IDADIR is the installation directory of the disassembler. The files within this directory are the loaders, and their names correspond to platforms and their binary formats. The file extensions have the following meanings:

ldw Binary implementation of a loader for the 32-bit version of IDA Pro

l64 Binary implementation of a loader for the 64-bit version of IDA Pro

py Python implementation of a loader for both versions of IDA Pro

By default, no loader is available for MBR or VBR at the time of writing this chapter, which is why you have to instruct IDA to load the MBR or VBR as a binary module. This section shows you how to write a custom Python-based MBR loader for IDA Pro that loads MBR in the 16-bit disassembler mode at the address 0x7C00 and parses the partition table.

Understanding loader.hpp

The place to start is the loader.hpp file, which is provided with the IDA Pro SDK and contains a lot of useful information related to loading executables in the disassembler. It defines structures and types to use, lists prototypes of the callback routines, and describes the parameters they take. Here is the list of the callbacks that should be implemented in a loader, according to loader.hpp:

accept_file This routine checks whether the file being loaded is of a supported format.

load_file This routine does the actual work of loading the file into the disassembler—that is, parsing the file format and mapping the file’s content into the newly created database.

save_file This is an optional routine that, if implemented, produces an executable from the disassembly upon executing the File▸Produce File▸Create EXE File command in the menu.

move_segm This is an optional routine that, if implemented, is executed when a user moves a segment within the database. It is mostly used when there is relocation information in the image that the user should take into account when moving a segment. Due to the MBR’s lack of relocations, we can skip this routine here, but we couldn’t if we were to write a loader for PE or ELF binaries.

init_loader_options This is an optional routine that, if implemented, asks a user for additional parameters for loading a particular file type, once the user chooses a loader. We can skip this routine as well, because we have no special options to add.

Now let’s take a look at the actual implementation of these routines in our custom MBR loader.

Implementing accept_file

In the accept_file routine, shown in Listing 8-10, we check whether the file in question is a Master Boot Record.

def accept_file(li, n):
   # check size of the file
   file_size = li.size()
   if file_size < 512:
     return 0

   # check MBR signature
   li.seek(510, os.SEEK_SET)
   mbr_sign = li.read(2)
   if mbr_sign[0] != 'x55' or mbr_sign[1] != 'xAA':
     return 0

   # all the checks are passed
return 'MBR'

Listing 8-10: The accept_file implementation

The MBR format is rather simple, so the following are the only indicators we need to perform this check:

File size The file should be at least 512 bytes, which corresponds to the minimum size of a hard drive sector.

MBR signature A valid MBR should end with the bytes 0xAA55.

If the conditions are met and the file is recognized as an MBR, the code returns a string with the name of the loader ; if the file is not an MBR, the code returns 0 .

Implementing load_file

Once accept_file returns a nonzero value, IDA Pro attempts to load the file by executing the load_file routine, which is implemented in your loader. This routine needs to perform the following steps:

  1. Read the whole file into a buffer.
  2. Create and initialize a new memory segment, into which the script will load the MBR contents.
  3. Set the very beginning of the MBR as an entry point for the disassembly.
  4. Parse the partition table contained in the MBR.

The load_file implementation is shown in Listing 8-11.

def load_file(li):
    # Select the PC processor module
  idaapi.set_processor_type("metapc", SETPROC_ALL|SETPROC_FATAL)

    # read MBR into buffer
  li.seek(0, os.SEEK_SET); buf = li.read(li.size())

    mbr_start = 0x7C00       # beginning of the segment
    mbr_size = len(buf)      # size of the segment
    mbr_end  = mbr_start + mbr_size

    # Create the segment
  seg = idaapi.segment_t()
    seg.startEA = mbr_start
    seg.endEA   = mbr_end
    seg.bitness = 0 # 16-bit
  idaapi.add_segm_ex(seg, "seg0", "CODE", 0)
    # Copy the bytes
  idaapi.mem2base(buf, mbr_start, mbr_end)

    # add entry point
    idaapi.add_entry(mbr_start, mbr_start, "start", 1)

    # parse partition table
  struct_id = add_struct_def()
    struct_size = idaapi.get_struc_size(struct_id)
  idaapi.doStruct(start + 0x1BE, struct_size, struct_id)

Listing 8-11: The load_file implementation

First, set the CPU type to metapc , which corresponds to the generic PC family, instructing IDA to disassemble the binary as IBM PC opcodes. Then read the MBR into a buffer and create a memory segment by calling the segment_t API . This call allocates an empty structure, seg, describing the segment to create. Then, populate it with the actual byte values. Set the starting address of the segment to 0x7C00, as you did in “Loading the MBR into IDA Pro” on page 96, and set its size to the corresponding size of the MBR. Also tell IDA that the new segment will be a 16-bit segment by setting the bitness flag of the structure to 0; note that 1 corresponds to 32-bit segments and 2 corresponds to 64-bit segments. Then, by calling the add_segm_ex API , add a new segment to the disassembly database. The add_segm_ex API takes these parameters: a structure describing the segment to create; the segment name (seg0); the segment class CODE; and flags, which is left at 0. Following this call , copy the MBR contents into the newly created segment and add an entry point indicator.

Next, add automatic parsing of the partition table present in the MBR by calling the doStruct API with these parameters: the address of the beginning of the partition table, the table size in bytes, and the identifier of the structure you want the table to be cast to. The add_struct_def routine implemented in our loader creates this structure. It imports the structures defining the partition table, PARTITION_TABLE_ENTRY, into the database.

Creating the Partition Table Structure

Listing 8-12 defines the add_struct_def routine, which creates the PARTITION_TABLE_ENTRY structure.

def add_struct_def(li, neflags, format):
    # add structure PARTITION_TABLE_ENTRY to IDA types
    sid_partition_entry = AddStrucEx(-1, "PARTITION_TABLE_ENTRY", 0)
    # add fields to the structure
    AddStrucMember(sid_partition_entry, "status", 0, FF_BYTE, -1, 1)
    AddStrucMember(sid_partition_entry, "chsFirst", 1, FF_BYTE, -1, 3)
    AddStrucMember(sid_partition_entry, "type", 4, FF_BYTE, -1, 1)
    AddStrucMember(sid_partition_entry, "chsLast", 5, FF_BYTE, -1, 3)
    AddStrucMember(sid_partition_entry, "lbaStart", 8, FF_DWRD, -1, 4)
    AddStrucMember(sid_partition_entry, "size", 12, FF_DWRD, -1, 4)

    # add structure PARTITION_TABLE to IDA types
    sid_table = AddStrucEx(-1, "PARTITION_TABLE", 0)
    AddStrucMember(sid_table, "partitions", 0, FF_STRU, sid, 64)

    return sid_table

Listing 8-12: Importing data structures into the disassembly database

Once your loader module is finished, copy it into the $IDADIRloaders directory as an mbr.py file. When a user attempts to load an MBR into the disassembler, the dialog in Figure 8-5 appears, confirming that your loader has successfully recognized the MBR image. Clicking OK executes the load_file routine implemented in your loader in order to apply the previously described customizations to the loaded file.

NOTE

When you’re developing custom loaders for IDA Pro, bugs in the script implementation may cause IDA Pro to crash. If this happens, simply remove the loader script from the loaders directory and restart the disassembler.

In this section, you’ve seen a small sample of the disassembler’s extension development capabilities. For a more complete reference on IDA Pro extension development, refer to The IDA Pro Book (No Starch Press, 2011) by Chris Eagle.

image

Figure 8-5: Choosing the custom MBR loader

Conclusion

In this chapter, we described a few simple steps for static analysis of the MBR and the VBR. You can easily extend the examples in this chapter to any code running in the preboot environment. You also saw that the IDA Pro disassembler provides a number of unique features that make it a handy tool for performing static analysis.

On the other hand, static analysis has its limitations—mainly related to the inability to see the code at work and observe how it manipulates the data. In many cases, static analysis can’t provide answers to all the questions a reverse engineer may have. In such situations, it’s important to examine the actual execution of the code to better understand its functionality or to obtain some information that may have been missing in the static context, such as encryption keys. This brings us to dynamic analysis, the methods and tools for which we’ll discuss in the next chapter.

Exercises

Complete the following exercises to get a better grasp of the material in this chapter. You’ll need to download a disk image from https://nostarch.com/rootkits/. The required tools for this exercise are the IDA Pro disassembler and a Python interpreter.

  1. Extract the MBR from the image by reading its first 512 bytes and saving them in a file named mbr.mbr. Load the extracted MBR into the IDA Pro disassembler. Examine and describe the code at the entry point.
  2. Identify code that decrypts the MBR. What kind of encryption is being used? Find the key used to decrypt the MBR.
  3. Write a Python script to decrypt the rest of the MBR code and execute it. Use the code in Listing 8-2 as a reference.
  4. To be able to load additional code from disk, the MBR code allocates a memory buffer. Where is the code allocating that buffer located? How many bytes of memory does the code allocate? Where is the pointer to the allocated buffer stored?
  5. After the memory buffer is allocated, the MBR code attempts to load additional code from disk. At which offset in which sectors does the MBR code start reading these sectors? How many sectors does it read?
  6. It appears that the data loaded from the disk is encrypted. Identify the MBR code that decrypts the read sectors. What is the address at which this MBR code will be loaded?
  7. Extract the encrypted sectors from the disk image by reading the number of bytes identified in exercise 4 from the found offset in the file stage2.mbr.
  8. Implement a Python script for decrypting the extracted sectors and execute it. Load the decrypted data into the disassembler (in the same way as the MBR) and examine its output.
  9. Identify the partition table in the MBR. How many partitions are there? Which one is active? Where on the image are these partitions located?
  10. Extract the VBR of the active partition from the image by reading its first 512 bytes and saving it in a vbr.vbr file. Load the extracted VBR into IDA Pro. Examine and describe the code at the entry point.
  11. What is the value stored in the HiddenSectors field of the BIOS parameter block in the VBR? At which offset is the IPL code located? Examine the VBR code and determine the size of the IPL (that is, how many bytes of the IPL are read).
  12. Extract the IPL code from the disk image by reading and saving it into an ipl.vbr file. Load the extracted IPL into IDA Pro. Find the location of the entry point in the IPL. Examine and describe the code at the entry point.
  13. Develop a custom VBR loader for IDA Pro that automatically parses the BIOS parameter block. Use the structure BIOS_PARAMETER_BLOCK_NTFS defined in Chapter 5.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.137.117