This chapter introduces the basic concepts of bootkit static analysis with IDA Pro. There are several ways to approach reversing bootkits, and covering all the existing approaches would require a book of its own. We focus on the IDA Pro disassembler, because it provides unique features that enable the static analysis of bootkits.
Statically analyzing bootkits is radically different from reverse engineering in most conventional application environments, because crucial parts of a bootkit execute in a preboot environment. For example, a typical Windows application relies on standard Windows libraries and is expected to call standard library functions known to reverse-engineering tools like Hex-Rays IDA Pro. We can deduce a lot about an application by the functions it calls; the same is true about Linux applications versus POSIX system calls. But the preboot environment lacks these hints, so the tools for preboot analysis need additional features to compensate for this missing information. Fortunately, these features are available in IDA Pro, and this chapter explains how to use them.
As discussed in Chapter 7, a bootkit consists of several closely connected modules: the Master Boot Record (MBR) or Volume Boot Record (VBR) infector, a malicious boot loader, and kernel-mode drivers, among others. We’ll restrict the discussion in this chapter to the analysis of a bootkit MBR and a legitimate operating system VBR, which you can use as a model for reversing any code that executes in the preboot environment. You can download the MBR and VBR you’ll use here from the book’s downloadable resources. At the end of the chapter, we discuss how to deal with other bootkit components, such as the malicious boot loader and kernel-mode drivers. If you haven’t already worked through Chapter 7, you should do so now.
First, we’ll show you how to get started with bootkit analysis; you’ll learn which options to use in IDA Pro in order to load the code into the disassembler, the API used in the preboot environment, how control is transferred between different modules, and which IDA features may simplify their reversal. Then you’ll learn how to develop a custom loader for IDA Pro in order to automate your reversing tasks. Finally, we provide a set of exercises designed to help you further explore bootkit static analysis. You can download the materials for this chapter from https://nostarch.com/rootkits/.
First, we’ll analyze a bootkit MBR in the IDA Pro disassembler. The MBR we use in this chapter is similar to the one the TDL4 bootkit creates (see Chapter 7). The TDL4 MBR is a good example because it implements traditional bootkit functionality, but its code is easy to disassemble and understand. We based the VBR example in this chapter on legitimate code from an actual Microsoft Windows volume.
In the following sections, you’ll load the MBR into IDA Pro and analyze the MBR code at its entry point. Then, you’ll decrypt the code and examine how the MBR manages memory.
The first step in the static analysis of the bootkit MBR is to load the MBR code into IDA. Because the MBR isn’t a conventional executable and has no dedicated loader, you need to load it as a binary module. IDA Pro will simply load the MBR into its memory as a single contiguous segment just as the BIOS does, without performing any extra processing. You only need to provide the starting memory address for this segment.
Load the binary file by opening it via IDA Pro. When IDA Pro first loads the MBR, it displays a message offering various options, as shown in Figure 8-1.
Figure 8-1: The IDA Pro dialog displayed when loading the MBR
You can accept the defaults for most of the parameters, but you need to enter a value into the Loading offset field ➊, which specifies where in memory to load the module. This value should always be 0x7C00—the fixed address where the MBR is loaded by the BIOS boot code. Once you’ve entered this offset, click OK. IDA Pro loads the module, then gives you the option to disassemble the module either in 16-bit or 32-bit mode, as shown in Figure 8-2.
Figure 8-2: IDA Pro dialog asking you which disassembly mode to choose
For this example, choose No. This directs IDA to disassemble the MBR as 16-bit real-mode code, which is the way the actual CPU decodes it at the very beginning of the boot process.
Because IDA Pro stores the results of disassembly in a database file with the extension idb, we’ll refer to the results of its disassembly as a database from now on. IDA uses this database to collect all of the code annotations you provide through your GUI actions and IDA scripts. You can think of the database as the implicit argument to all IDA script functions, which represents the current state of your hard-won reverse-engineering knowledge about the binary on which IDA can act.
If you don’t have any experience with databases, don’t worry: IDA’s interfaces are designed so that you don’t need to know the database internals. Understanding how IDA represents what it learns about code, however, does help a lot.
When loaded by the BIOS at boot, the MBR—now modified by the infecting bootkit—is executed from its first byte. We specified its loading address to IDA’s disassembler as 0:7C00h, which is where the BIOS loads it. Listing 8-1 shows the first few bytes of the loaded MBR image.
seg000:7C00 ; Segment type: Pure code
seg000:7C00 seg000 segment byte public 'CODE' use16
seg000:7C00 assume cs:seg000
seg000:7C00 ;org 7C00h
seg000:7C00 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, gs:nothing
seg000:7C00 xor ax, ax
seg000:7C02 ➊ mov ss, ax
seg000:7C04 mov sp, 7C00h
seg000:7C07 mov es, ax
seg000:7C09 mov ds, ax
seg000:7C0B sti
seg000:7C0C pusha
seg000:7C0D mov cx, 0CFh
seg000:7C10 mov bp, 7C19h
seg000:7C13
seg000:7C13 loc_7C13: ; CODE XREF: seg000:7C17
seg000:7C13 ➋ ror byte ptr [bp+0], cl
seg000:7C16 inc bp
seg000:7C17 loop loc_7C13
seg000:7C17 ; ---------------------------------------------------------------------------
seg000:7C19 encrypted_code db 44h, 85h, 1Dh, 0C7h, 1Ch, 0B8h, 26h, 4, 8, 68h, 62h
seg000:7C19 ➌ db 40h, 0Eh, 83h, 0Ch, 0A3h, 0B1h, 1Fh, 96h, 84h, 0F5h
Listing 8-1: Entry point of the MBR
Early on we see the initialization stub ➊ that sets up the stack segment selector ss, stack pointer sp, and segment selector registers es and ds in order to access memory and execute subroutines. Following the initialization stub is a decryption routine ➋, which deciphers the rest of the MBR ➌ by rotating the bits—byte by byte—with an ror instruction, then passes control to the decrypted code. The size of the encrypted blob is given in the cx register, and the bp register points to the blob. This ad hoc encryption is intended to hamper static analysis and avoid detection by security software. It also presents us with our first obstacle, because we now need to extract the actual code to proceed with the analysis.
To continue our analysis of an encrypted MBR, we need to decrypt the code. Thanks to the IDA scripting engine, you can easily accomplish this task with the Python script in Listing 8-2.
➊ import idaapi
# beginning of the encrypted code and its size in memory
start_ea = 0x7C19
encr_size = 0xCF
➋ for ix in xrange(encr_size):
➌ byte_to_decr = idaapi.get_byte(start_ea + ix)
to_rotate = (0xCF - ix) % 8
byte_decr = (byte_to_decr >> to_rotate) | (byte_to_decr << (8 - to_rotate))
➍ idaapi.patch_byte(start_ea + ix, byte_decr)
Listing 8-2: Python script to decrypt the MBR code
First, we import the idaapi package ➊, which contains the IDA API library. Then we loop through and decrypt the encrypted bytes ➋. To fetch a byte from the disassembly segment, we use the get_byte API ➌, which takes the address of the byte to read as its only parameter. Once it’s decrypted, we write the byte back to the disassembly region ➍ using the patch_byte API, which takes the address of the byte to modify and the value to write there. You can execute the script by choosing File▸Script from the IDA menu or by pressing ALT-F7.
NOTE
This script doesn’t modify the actual image of the MBR but rather its representation in IDA—that is, IDA’s idea of what the loaded code will look when it’s ready to run. Before making any modifications to the disassembled code, you should create a backup of the current version of the IDA database. That way, if the script modifying the MBR code contains bugs and distorts the code, you’ll be able to easily recover its most recent version.
Having decrypted the code, let’s proceed with analyzing it. If you look through the decrypted code, you’ll find the instructions shown in Listing 8-3. These instructions initialize the malicious code by storing the MBR input parameters and memory allocation.
seg000:7C19 ➊ mov ds:drive_no, dl
seg000:7C1D ➋ sub word ptr ds:413h, 10h
seg000:7C22 mov ax, ds:413h
seg000:7C25 shl ax, 6
seg000:7C28 ➌ mov ds:buffer_segm, ax
Listing 8-3: Memory allocation in the preboot environment
The assembly instruction that stores the contents of the dl register into memory is at an offset from the ds segment ➊. From our experience analyzing this kind of code, we can guess that the dl register contains the number of the hard drive from which the MBR is being executed; annotate this offset as a variable called drive_no. IDA Pro records this annotation in the database and shows it in the listing. When performing I/O operations, you can use this integer index to distinguish between different disks available to the system. You’ll use this variable in the BIOS disk service in the next section.
Similarly, Listing 8-3 shows the annotation buffer_segm ➌ for the offset where the code allocates a buffer. IDA Pro helpfully propagates these annotations to other code that uses the same variables.
At ➋, we see a memory allocation. In the preboot environment, there is no memory manager in the sense of modern operating systems, such as the OS logic backing malloc() calls. Instead, the BIOS maintains the number of kilobytes of available memory in a word—a 16-bit value in x86 architecture—located at the address 0:413h. In order to allocate X KB of memory, we subtract X from the total size of available memory, a value stored in the word at 0:413h, as shown in Figure 8-3.
Figure 8-3: Memory management in a preboot environment
In Listing 8-3, the code allocates a buffer of 10Kb by subtracting 10h from the total amount available. The actual address is stored in the variable buffer_segm ➌. The MBR then uses the allocated buffer to store read data from the hard drive.
Another unique aspect of the preboot environment is the BIOS disk service, an API used to communicate with a hard drive. This API is particularly interesting in the context of bootkit analysis for two reasons. First, bootkits use it to read data from the hard drive, so it’s important to be familiar with the API’s most frequently used commands in order to understand bootkit code. Also, this API is itself a frequent target of bootkits. In the most common scenario, a bootkit hooks the API to patch legitimate modules that are read from the hard drive by other code during the boot process.
The BIOS disk service is accessible via an INT 13h instruction. In order to perform I/O operations, software passes I/O parameters through the processor registers and executes the INT 13h instruction, which transfers control to the appropriate handler. The I/O operation code, or identifier, is passed in the ah register—the higher-order part of the ax register. The register dl is used to pass the index of the disk in question. The processor’s carry flag (CF) is used to indicate whether an error has occurred during execution of the service: if CF is set to 1, an error has occurred and the detailed error code is returned in the ah register. This BIOS convention for passing arguments to a function predates the modern OS system call conventions; if it seems convoluted to you, remember that this is where the idea of uniform system call interfaces originated.
This INT 13h interrupt is an entry point to the BIOS disk service, and it allows software in the preboot environment to perform basic I/O operations on disk devices, like hard drives, floppy drives, and CD-ROMs, as shown in Table 8-1.
Table 8-1: The INT 13h Commands
Operation code |
Operation description |
2h |
Read sectors into memory |
3h |
Write disk sectors |
8h |
Get drive parameters |
41h |
Extensions installation check |
42h |
Extended read |
43h |
Extended write |
48h |
Extended get drive parameters |
The operations in Table 8-1 are split into two groups: the first group (with codes 41h, 42h, 43h, and 48h) comprises the extended operations, and the second group (with codes 2h, 3h, and 8h) consists of the legacy operations.
The only difference between the groups is that the extended operations can use an addressing scheme based on logical block addressing (LBA), whereas the legacy operations rely solely on a legacy Cylinder Head Sector (CHS)–based addressing scheme. In the case of the LBA-based scheme, sectors are enumerated linearly on the disk, beginning with index 0, whereas in the CHS-based scheme, each sector is addressed using the tuple (c,h,s), where c is the cylinder number, h is the head number, and s is the number of the sector. Although bootkits may use either group, almost all modern hardware supports the LBA-based addressing scheme.
As you continue looking at the MBR code that follows the 10KB memory allocation, you should see the execution of an INT 13h instruction, as shown in Listing 8-4.
seg000:7C2B ➊ mov ah, 48h
seg000:7C2D ➋ mov si, 7CF9h
seg000:7C30 mov ds:drive_param.bResultSize, 1Eh
seg000:7C36 int 13h ; DISK - IBM/MS Extension
➌ ; GET DRIVE PARAMETERS
; (DL - drive, DS:SI - buffer)
Listing 8-4: Obtaining drive parameters via the BIOS disk service
The small size of the MBR (512 bytes) restricts the functionality of the code that can be implemented within it. For this reason, the bootkit loads additional code to execute, called a malicious boot loader, which is placed in hidden storage at the end of the hard drive. To obtain the coordinates of the hidden storage on the disk, the MBR code uses the extended “get drive parameters” operation (operation code 48h in Table 8-1), which returns information about the hard drive’s size and geometry. This information allows the bootkit to compute the offset at which the additional code is located on the hard drive.
In Listing 8-4, you can see an automatically generated comment from IDA Pro for the instruction INT 13h ➌. During code analysis, IDA Pro identifies parameters passed to the BIOS disk service handler call and generates a comment with the name of the requested disk I/O operation and the register names used to pass parameters to the BIOS handler. This MBR code executes INT 13h with parameter 48h ➊. Upon execution, this routine fills a special structure called EXTENDED_GET_PARAMS that provides the drive parameters. The address of this structure is stored in the si register ➋.
The EXTENDED_GET_PARAMS routing is provided in Listing 8-5.
typedef struct _EXTENDED_GET_PARAMS {
WORD bResultSize; // Size of the result
WORD InfoFlags; // Information flags
DWORD CylNumber; // Number of physical cylinders on drive
DWORD HeadNumber; // Number of physical heads on drive
DWORD SectorsPerTrack; // Number of sectors per track
➊ QWORD TotalSectors; // Total number of sectors on drive
➋ WORD BytesPerSector; // Bytes per sector
} EXTENDED_GET_PARAMS, *PEXTENDED_GET_PARAMS;
Listing 8-5: The EXTENDED_GET_PARAMS structure layout
The only fields the bootkit actually looks at in the returned structure are the number of sectors on the hard drive ➊ and the size of the disk sector in bytes ➋. The bootkit computes the total size of the hard drive in bytes by multiplying these two values, then uses the result to locate the hidden storage at the end of the drive.
Once the bootkit has obtained the hard drive parameters and calculated the offset of the hidden storage, the bootkit MBR code reads this hidden data from the disk with the extended read operation of the BIOS disk service. This data is the next-stage malicious boot loader intended to bypass OS security checks and load a malicious kernel-mode driver. Listing 8-6 shows the code that reads it into RAM.
seg000:7C4C read_loop: ; CODE XREF: seg000:7C5Dj
seg000:7C4C ➊ call read_sector
seg000:7C4F mov si, 7D1Dh
seg000:7C52 mov cx, ds:word_7D1B
seg000:7C56 rep movsb
seg000:7C58 mov ax, ds:word_7D19
seg000:7C5B test ax, ax
seg000:7C5D jnz short read_loop
seg000:7C5F popa
seg000:7C60 ➋ jmp far boot_loader
Listing 8-6: Code for loading an additional malicious boot loader from the disk
In the read_loop, this code repeatedly reads sectors from the hard drive using the routine read_sector ➊ and stores them in the previously allocated memory buffer. Then the code transfers control to this malicious boot loader by executing a jmp far instruction ➋.
Looking at the code of the read_sector routine, in Listing 8-7 you can see the usage of INT 13h with the parameter 42h, which corresponds to the extended read operation.
seg000:7C65 read_sector proc near
seg000:7C65 pusha
seg000:7C66 ➊ mov ds:disk_address_packet.PacketSize, 10h
seg000:7C6B ➋ mov byte ptr ds:disk_address_packet.SectorsToTransfer, 1
seg000:7C70 push cs
seg000:7C71 pop word ptr ds:disk_address_packet.TargetBuffer+2
seg000:7C75 ➌ mov word ptr ds:disk_address_packet.TargetBuffer, 7D17h
seg000:7C7B push large [dword ptr ds:drive_param.TotalSectors_l]
seg000:7C80 ➍ pop large [ds:disk_address_packet.StartLBA_l]
seg000:7C85 push large [dword ptr ds:drive_param.TotalSectors_h]
seg000:7C8A ➎ pop large [ds:disk_address_packet.StartLBA_h]
seg000:7C8F inc eax
seg000:7C91 sub ds:disk_address_packet.StartLBA_l, eax
seg000:7C96 sbb ds:disk_address_packet.StartLBA_h, 0
seg000:7C9C mov ah, 42h
seg000:7C9E ➏ mov si, 7CE9h
seg000:7CA1 mov dl, ds:drive_no
seg000:7CA5 ➐ int 13h ; DISK - IBM/MS Extension
; EXTENDED READ
; (DL - drive, DS:SI - disk address packet)
seg000:7CA7 popa
seg000:7CA8 retn
seg000:7CA8 read_sector endp
Listing 8-7: Reading sectors from the disk
Before executing INT 13h ➐, the bootkit code initializes the DISK_ADDRESS_PACKET structure with the proper parameters, including the size of the structure ➊, the number of sectors to transfer ➋, the address of the buffer to store the result ➌, and the addresses of the sectors to read ➍ ➎. This structure’s address is provided to the INT 13h handler via the ds and si registers ➏. Note the manual annotation of the structure’s offsets; IDA picks them up and propagates them. The BIOS disk service uses DISK_ADDRESS_PACKET to uniquely identify which sectors to read from the hard drive. The complete layout of the structure of DISK_ADDRESS_PACKET, with comments, is provided in Listing 8-8.
typedef struct _DISK_ADDRESS_PACKET {
BYTE PacketSize; // Size of the structure
BYTE Reserved;
WORD SectorsToTransfer; // Number of sectors to read/write
DWORD TargetBuffer; // segment:offset of the data buffer
QWORD StartLBA; // LBA address of the starting sector
} DISK_ADDRESS_PACKET, *PDISK_ADDRESS_PACKET;
Listing 8-8: The DISK_ADDRESS_PACKET structure layout
Once the boot loader is read into the memory buffer, the bootkit executes it.
At this point, we have finished our the analysis of the MBR code and we’ll proceed to dissecting another essential part of the MBR: the partition table. You can download the complete version of the disassembled and commented malicious MBR at https://nostarch.com/rootkits/.
The MBR partition table is a common target of bootkits because the data it contains—although limited—plays a crucial part in the boot process’s logic. Introduced in Chapter 5, the partition table is located at the offset 0x1BE in the MBR and consists of four entries, each 0x10 bytes in size. It lists the partitions available on the hard drive, describes their type and location, and specifies where the MBR code should transfer control when it’s done. Usually, the sole purpose of legitimate MBR code is to scan this table for the active partition—that is, the partition marked with the appropriate bit flag and containing the VBR—and load it. You might be able to intercept this execution flow at the very early boot stage by simply manipulating the information contained in the table, without modifying the MBR code itself; the Olmasco bootkit, which we’ll discuss in Chapter 10, implements this method.
This illustrates an important principle of bootkit and rootkit design: if you can manipulate some data surreptitiously enough to bend the control flow, then that approach is preferred to patching the code. This saves the malware programmer the effort of testing new, altered code—a good example of code reuse promoting reliability!
Complex data structures like an MBR or VBR notoriously afford attackers many opportunities to treat them as a kind of bytecode and to treat the native code that consumes the data as a virtual machine programmed through the input data. The language-theoretic security (LangSec, http://langsec.org/) approach explains why this is the case.
Being able to read and understand the MBR’s partition table is essential for spotting this kind of early bootkit interception. Take a look at the partition table in Figure 8-4, where each 16/10h-byte line is a partition table entry.
Figure 8-4: Partition table of the MBR
As you can see, the table has two entries—the top two lines—which implies there are only two partitions on the disk. The first partition entry starts at the address 0x7DBE; its very first byte ➊ shows that this partition is active, so the MBR code should load and execute its VBR, which is the first sector of that partition. The byte at offset 0x7DC2 ➋ describes the type of the partition—that is, the particular filesystem type that should be expected there by the OS, by the bootloader itself, or by other low-level disk access code. In this case, 0x07 corresponds to Microsoft’s NTFS. (For more information on partition types, see “The Windows Boot Process” on page 60.)
Next, the DWORD at 0x7DC5 ➌ in the partition table entry indicates that the partition starts at offset 0x800 from the beginning of the hard drive; this offset is counted in sectors. The last DWORD ➍ of the entry specifies the partition’s size in sectors (0x32000). Table 8-2 details the particular example in Figure 8-4. In the Beginning offset and Partition size columns, the actual values are provided in sectors, with bytes in parentheses.
Table 8-2: MBR Partition Table Contents
Partition index |
Is active |
Type |
Beginning offset, sectors (bytes) |
Partition size, sectors (bytes) |
0 |
True |
NTFS (0x07) |
0x800 (0x100000) |
0x32000 (0x6400000) |
1 |
False |
NTFS (0x07) |
0x32800 (0x6500000) |
0x4FCD000 (0x9F9A00000) |
2 |
N/A |
N/A |
N/A |
N/A |
3 |
N/A |
N/A |
N/A |
N/A |
The reconstructed partition table indicates where you should look next in your analysis of the boot sequence. Namely, it tells you where the VBR is. The coordinates of the VBR are stored in the Beginning offset column of the primary partition entry. In this case, the VBR is located at an offset 0x100000 bytes from the beginning of the hard drive, which is the place to look in order to continue your analysis.
In this section, we’ll consider VBR static analysis approaches using IDA and focus on an essential VBR concept called BIOS parameter block (BPB), which plays an important role in the boot process and bootkit infection. The VBR is also a common target of bootkits, as we explained briefly in Chapter 7. In Chapter 12, we’ll discuss the Gapz bootkit, which infects the VBR in order to persist on the infected system, in more detail. The Rovnix bookit, discussed in Chapter 11, also makes use of the VBR to infect a system.
You should load the VBR into the disassembler in essentially the same way you loaded the MBR, since it’s also executed in real mode. Load the VBR file, vbr_sample_ch8.bin, from the samples directory for Chapter 8 as a binary module at 0:7C00h and in 16-bit disassembly mode.
The main purpose of the VBR is to locate the Initial Program Loader (IPL) and to read it into RAM. The location of the IPL on the hard drive is specified in the BIOS_PARAMETER_BLOCK_NTFS structure, which we discussed in Chapter 5. Stored directly in the VBR, BIOS_PARAMETER_BLOCK_NTFS contains a number of fields that define the geometry of the NTFS volume, such as the number of bytes per sector, the number of sectors per cluster, and the location of the master file table.
The HiddenSectors field, which stores the number of sectors from the beginning of the hard drive to the beginning of the NTFS volume, defines the actual location of the IPL. The VBR assumes that the NTFS volume begins with the VBR, immediately followed by the IPL. So the VBR code loads the IPL by fetching the contents of the HiddenSectors field, incrementing the fetched value by 1, and then reading 0x2000 bytes—which corresponds to 16 sectors—from the calculated offset. Once the IPL is loaded from disk, the VBR code transfers control to it.
Listing 8-9 shows a part of the BIOS parameter block structure in our example.
seg000:000B bpb dw 200h ; SectorSize
seg000:000D db 8 ; SectorsPerCluster
seg000:001E db 3 dup(0) ; reserved
seg000:0011 dw 0 ; RootDirectoryIndex
seg000:0013 dw 0 ; NumberOfSectorsFAT
seg000:0015 db 0F8h ; MediaId
seg000:0016 db 2 dup(0) ; Reserved2
seg000:0018 dw 3Fh ; SectorsPerTrack
seg000:001A dw 0FFh ; NumberOfHeads
seg000:001C dd 800h ; HiddenSectors➊
Listing 8-9: The BIOS parameter block of the VBR
The value of HiddenSectors ➊ is 0x800, which corresponds to the beginning offset of the active partition on the disk in Table 8-2. This shows that the IPL is located at offset 0x801 from the beginning of the disk. Bootkits use this information to intercept control during the boot process. The Gapz bootkit, for example, modifies the contents of the HiddenSectors field so that, instead of a legitimate IPL, the VBR code reads and executes the malicious IPL. Rovnix, on the other hand, uses another strategy: it modifies the legitimate IPL’s code. Both manipulations intercept control at the early boot of the system.
Once the IPL receives control, it loads bootmgr, which is stored in the filesystem of the volume. After this, other bootkit components, such as malicious boot loaders and kernel-mode drivers, may kick in. A full analysis of these modules is beyond the scope of this chapter, but we’ll briefly outline some approaches.
Malicious boot loaders constitute an important part of bootkits. Their main purposes are to survive through the CPU’s execution mode switching, bypass OS security checks (such as driver signature enforcement), and load malicious kernel-mode drivers. They implement functionality that cannot fit in the MBR and the VBR due to their size limitations, and they’re stored separately on the hard drive. Bootkits store their boot loaders in hidden storage areas located either at the end of the hard drive, where there is usually some unused disk space, or in free disk space between partitions, if there is any.
A malicious boot loader may contain different code to be executed in different processor execution modes:
16-bit real mode Interrupt 13h hooking functionality
32-bit protected mode Bypass OS security checks for 32-bit OS version
64-bit protected mode (long mode) Bypass OS security checks for 64-bit OS version
But the IDA Pro disassembler can’t keep code disassembled in different modes in a single IDA database, so you’ll need to maintain different versions of the IDA Pro database for different execution modes.
In most cases, the kernel-mode drivers that bootkits load are valid PE images. They implement rootkit functionality that allows malware to avoid detection by security software and provides covert communication channels, among other things. Modern bootkits usually contain two versions of the kernel-mode driver, compiled for the x86 and x64 platforms. You may analyze these modules using conventional approaches for static analysis of executable images. IDA Pro does a decent job of loading such executables, and it provides a lot of supplemental tools and information for their analysis. However, we’ll discuss how to instead use IDA Pro’s features to automate the analysis of bootkits by preprocessing them as IDA loads them.
One of the most striking features of the IDA Pro disassembler is the breadth of its support for various file formats and processor architectures. To achieve this, the functionality for loading particular types of executables is implemented in special modules called loaders. By default, IDA Pro contains a number of loaders, covering the most frequent types of executables, such as PE (Windows), ELF (Linux), Mach-O (macOS), and firmware image formats. You can obtain the list of available loaders by inspecting the contents of your $IDADIRloaders directory, where $IDADIR is the installation directory of the disassembler. The files within this directory are the loaders, and their names correspond to platforms and their binary formats. The file extensions have the following meanings:
ldw Binary implementation of a loader for the 32-bit version of IDA Pro
l64 Binary implementation of a loader for the 64-bit version of IDA Pro
py Python implementation of a loader for both versions of IDA Pro
By default, no loader is available for MBR or VBR at the time of writing this chapter, which is why you have to instruct IDA to load the MBR or VBR as a binary module. This section shows you how to write a custom Python-based MBR loader for IDA Pro that loads MBR in the 16-bit disassembler mode at the address 0x7C00 and parses the partition table.
The place to start is the loader.hpp file, which is provided with the IDA Pro SDK and contains a lot of useful information related to loading executables in the disassembler. It defines structures and types to use, lists prototypes of the callback routines, and describes the parameters they take. Here is the list of the callbacks that should be implemented in a loader, according to loader.hpp:
accept_file This routine checks whether the file being loaded is of a supported format.
load_file This routine does the actual work of loading the file into the disassembler—that is, parsing the file format and mapping the file’s content into the newly created database.
save_file This is an optional routine that, if implemented, produces an executable from the disassembly upon executing the File▸Produce File▸Create EXE File command in the menu.
move_segm This is an optional routine that, if implemented, is executed when a user moves a segment within the database. It is mostly used when there is relocation information in the image that the user should take into account when moving a segment. Due to the MBR’s lack of relocations, we can skip this routine here, but we couldn’t if we were to write a loader for PE or ELF binaries.
init_loader_options This is an optional routine that, if implemented, asks a user for additional parameters for loading a particular file type, once the user chooses a loader. We can skip this routine as well, because we have no special options to add.
Now let’s take a look at the actual implementation of these routines in our custom MBR loader.
In the accept_file routine, shown in Listing 8-10, we check whether the file in question is a Master Boot Record.
def accept_file(li, n):
# check size of the file
file_size = li.size()
if file_size < 512:
➊ return 0
# check MBR signature
li.seek(510, os.SEEK_SET)
mbr_sign = li.read(2)
if mbr_sign[0] != 'x55' or mbr_sign[1] != 'xAA':
➋ return 0
# all the checks are passed
➌ return 'MBR'
Listing 8-10: The accept_file implementation
The MBR format is rather simple, so the following are the only indicators we need to perform this check:
File size The file should be at least 512 bytes, which corresponds to the minimum size of a hard drive sector.
MBR signature A valid MBR should end with the bytes 0xAA55.
If the conditions are met and the file is recognized as an MBR, the code returns a string with the name of the loader ➌; if the file is not an MBR, the code returns 0 ➊ ➋.
Once accept_file returns a nonzero value, IDA Pro attempts to load the file by executing the load_file routine, which is implemented in your loader. This routine needs to perform the following steps:
The load_file implementation is shown in Listing 8-11.
def load_file(li):
# Select the PC processor module
➊ idaapi.set_processor_type("metapc", SETPROC_ALL|SETPROC_FATAL)
# read MBR into buffer
➋ li.seek(0, os.SEEK_SET); buf = li.read(li.size())
mbr_start = 0x7C00 # beginning of the segment
mbr_size = len(buf) # size of the segment
mbr_end = mbr_start + mbr_size
# Create the segment
➌ seg = idaapi.segment_t()
seg.startEA = mbr_start
seg.endEA = mbr_end
seg.bitness = 0 # 16-bit
➍ idaapi.add_segm_ex(seg, "seg0", "CODE", 0)
# Copy the bytes
➎ idaapi.mem2base(buf, mbr_start, mbr_end)
# add entry point
idaapi.add_entry(mbr_start, mbr_start, "start", 1)
# parse partition table
➏ struct_id = add_struct_def()
struct_size = idaapi.get_struc_size(struct_id)
➐ idaapi.doStruct(start + 0x1BE, struct_size, struct_id)
Listing 8-11: The load_file implementation
First, set the CPU type to metapc ➊, which corresponds to the generic PC family, instructing IDA to disassemble the binary as IBM PC opcodes. Then read the MBR into a buffer ➋ and create a memory segment by calling the segment_t API ➌. This call allocates an empty structure, seg, describing the segment to create. Then, populate it with the actual byte values. Set the starting address of the segment to 0x7C00, as you did in “Loading the MBR into IDA Pro” on page 96, and set its size to the corresponding size of the MBR. Also tell IDA that the new segment will be a 16-bit segment by setting the bitness flag of the structure to 0; note that 1 corresponds to 32-bit segments and 2 corresponds to 64-bit segments. Then, by calling the add_segm_ex API ➍, add a new segment to the disassembly database. The add_segm_ex API takes these parameters: a structure describing the segment to create; the segment name (seg0); the segment class CODE; and flags, which is left at 0. Following this call ➎, copy the MBR contents into the newly created segment and add an entry point indicator.
Next, add automatic parsing of the partition table present in the MBR by calling the doStruct API ➐ with these parameters: the address of the beginning of the partition table, the table size in bytes, and the identifier of the structure you want the table to be cast to. The add_struct_def routine ➏ implemented in our loader creates this structure. It imports the structures defining the partition table, PARTITION_TABLE_ENTRY, into the database.
Listing 8-12 defines the add_struct_def routine, which creates the PARTITION_TABLE_ENTRY structure.
def add_struct_def(li, neflags, format):
# add structure PARTITION_TABLE_ENTRY to IDA types
sid_partition_entry = AddStrucEx(-1, "PARTITION_TABLE_ENTRY", 0)
# add fields to the structure
AddStrucMember(sid_partition_entry, "status", 0, FF_BYTE, -1, 1)
AddStrucMember(sid_partition_entry, "chsFirst", 1, FF_BYTE, -1, 3)
AddStrucMember(sid_partition_entry, "type", 4, FF_BYTE, -1, 1)
AddStrucMember(sid_partition_entry, "chsLast", 5, FF_BYTE, -1, 3)
AddStrucMember(sid_partition_entry, "lbaStart", 8, FF_DWRD, -1, 4)
AddStrucMember(sid_partition_entry, "size", 12, FF_DWRD, -1, 4)
# add structure PARTITION_TABLE to IDA types
sid_table = AddStrucEx(-1, "PARTITION_TABLE", 0)
AddStrucMember(sid_table, "partitions", 0, FF_STRU, sid, 64)
return sid_table
Listing 8-12: Importing data structures into the disassembly database
Once your loader module is finished, copy it into the $IDADIRloaders directory as an mbr.py file. When a user attempts to load an MBR into the disassembler, the dialog in Figure 8-5 appears, confirming that your loader has successfully recognized the MBR image. Clicking OK executes the load_file routine implemented in your loader in order to apply the previously described customizations to the loaded file.
NOTE
When you’re developing custom loaders for IDA Pro, bugs in the script implementation may cause IDA Pro to crash. If this happens, simply remove the loader script from the loaders directory and restart the disassembler.
In this section, you’ve seen a small sample of the disassembler’s extension development capabilities. For a more complete reference on IDA Pro extension development, refer to The IDA Pro Book (No Starch Press, 2011) by Chris Eagle.
Figure 8-5: Choosing the custom MBR loader
In this chapter, we described a few simple steps for static analysis of the MBR and the VBR. You can easily extend the examples in this chapter to any code running in the preboot environment. You also saw that the IDA Pro disassembler provides a number of unique features that make it a handy tool for performing static analysis.
On the other hand, static analysis has its limitations—mainly related to the inability to see the code at work and observe how it manipulates the data. In many cases, static analysis can’t provide answers to all the questions a reverse engineer may have. In such situations, it’s important to examine the actual execution of the code to better understand its functionality or to obtain some information that may have been missing in the static context, such as encryption keys. This brings us to dynamic analysis, the methods and tools for which we’ll discuss in the next chapter.
Complete the following exercises to get a better grasp of the material in this chapter. You’ll need to download a disk image from https://nostarch.com/rootkits/. The required tools for this exercise are the IDA Pro disassembler and a Python interpreter.
18.191.233.15