Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. Mohanta, A. SaldanhaMalware Analysis and Detection Engineeringhttps://doi.org/10.1007/978-1-4842-6193-4_4

4. Virtual Memory and the Portable Executable (PE) File

Abhijit Mohanta¹ and Anoop Saldanha²

(1)

Independent Cybersecurity Consultant, Bhubaneswar, Odisha, India

(2)

Independent Cybersecurity Consultant, Mangalore, Karnataka, India

A process is defined as a program under execution. Whether a program is clean or malicious, it needs to execute as a process to carry out its desired intention. In this chapter, we go through the various steps involved in loading a program as a process. We also explore the various components of a process and understand important concepts like virtual memory, which is a memory-related facility that is abstracted by the operating system (OS) for all processes running on the system. We also dissect the PE file format, which is used by all executable files in Windows, and explore how its various headers and fields are used by the OS to load the PE executable program as a process. We also cover other types of PE files, such as DLLs, and explore how they are loaded and used by programs.

Process Creation

Let us explore how a program is turned into a process by the OS. As an example, let us use Sample-4-1 from the samples repository. Sample-4-1 is an executable file that has been compiled/generated from the source code shown in Listing 4-1. As the code shows, it is a very basic C program that prints Hello World by making a printf() function , after which it goes into an idle infinite while loop .

/****** Sample-4-1.c ******/

#include <stdio.h>

int main()

{ printf("Hello World!");

while(1); // infinite while loop

return 0;

}

Listing 4-1

Simple Hello World C Program Compiled into Sample-4-1 in Our Samples Repo

To execute this program, you can start by renaming the file and adding an extension of .exe, after which it should now be named Sample-4-1.exe. Please note that all samples in this book in the sample repository don’t have the file extension for safety reasons, and some of the exercises might need you to add an extension suffix. Also, you need to make sure you have extension hiding disabled to add an extension suffix, as explained in Chapter 2.

Executing the Program

Our program is now ready for execution. Now start the Windows Task Manager and go to the Processes tab. Make a visual note of all the processes running on the system, and make sure there is no process called Sample-4-1.exe running yet. You can now double-click Sample-4-1.exe in your folder to execute it as a process. Now go back to the Task Manager and check the Processes tab again. In Figure 4-1, you see a new process called Sample-4-1.exe in the list of processes in the Task Manager.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig1_HTML.jpg — Figure 4-1
Windows Task Manager shows our *Sample-4-1.exe* process under execution

The preceding process was not created until we double-clicked on the .exe file in the folder. Do note that without the .exe file extension, double-clicking it wouldn’t launch it as a process because of no file association, as we learned in Chapter 3. The Sample-4-1.exe file is a program, and after double-clicking it, the OS created a process out of this program, as you can now see in the Task Manager. Let’s now dig into more details of our process using Process Hacker, a tool that substitutes as an advanced Task Manager. Alternatively, you can also use Process Explorer (an advanced task manager) to dissect processes running on your system.

Note

To double-click and run a program as a process, the program should have an .exe extension. To add an extension to a file, you must Disable Extension Hiding in Windows, as explained in Chapter 2.

Exploring the Process with Process Hacker

While Task Manager is a decent program to get a list of processes running on the system, it is wildly inadequate if you want to investigate the details of a process, especially from a malware analysis perspective. We use Process Hacker, an advanced Task manager that we introduced in Chapter 2, as seen in Figure 4-2.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig2_HTML.jpg — Figure 4-2
Process Hacker tool

Each process has a name, which is the name of the program from which it was created. The process name is not unique in the list of processes. Two or more processes can have the same name without conflict. Similarly, multiple processes from the same program file can be created, meaning multiple processes may not only have the same name but the same program executable path.

To uniquely identify a process, each process is given a unique ID called the process ID, or PID, by the OS. PID is a randomly assigned ID/number, and it changes each time the program is executed even on the very same system.

Hovering the mouse over a process in Process Hacker displays the corresponding executable name, path, PID, and other information. To investigate a process more minutely, you can double-click a process, which should open a new Properties window, as seen in Figure 4-3. There are several tabs in the Properties window. A few of the important tabs are General, Threads, Tokens, Modules, Memory, and Handles.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig3_HTML.jpg — Figure 4-3
Properties window of a process by Process Hacker, with the General tab in view

As seen in Figure 4-3, the General tab displays information about how, when, and who has started the process.

Figure 4-3 shows the name and PID of the parent of our sample process as explorer.exe and 2636, respectively. How did explorer.exe end up as the parent of our process? We browse our folders and files on our Windows system using a graphical user interface. This user interface is rendered by Windows Explorer or File Browser, represented by the process explorer.exe. Using File Browser provided by explorer.exe, we double-clicked the Sample-4-1.exe program earlier to create a process out of it, thereby establishing explorer.exe as the parent of our Sample-4-1.exe process.

Other entries in the General tab to keep an eye on are Command-Line and Current Directory. The Command Line option shows the command line parameters provided to the process. While analyzing malware samples, it is important to keep an eye on the Command Line field since some malware accepts the command line to operate and exhibit malicious behavior. Without specific command-line options, malware may not function as intended, basically fooling analysts and anti-malware products. The Current Directory field shows the path to the root or the base directory which the process operates from.

In upcoming sections and chapters, we explore other aspects of a process, such as virtual memory, handles, mutexes, and threads, and investigate them via the various tabs and options provided by Process Hacker. You should familiarize yourself with an important tool called Process Explorer (installed in Chapter 2), which operates similarly to Process Hacker. Getting comfortable with these tools is important for malware analysis, and we encourage you to play with these tools as much as possible.

Note

While dynamically analyzing malware, it’s very important to keep an eye on various processes started on the system, their parent processes, the command line used, and the path of the executable program.

Virtual Memory

Hardware has often posed a hindrance to creating cost-effective portable computers since inception. To overcome some of these constraints, computer scientists have often invented software-based solutions to simulate the actual hardware. This section talks about one such solution called virtual memory that has been implemented to abstract and simulate physical memory (RAM). Fritz-Rudolf Güntsch invented the concept of virtual memory, and it has been implemented in all modern operating systems today.

Virtual memory is a complex topic, and to get a better understanding of it, we recommend any OS book. In this section, we simplify this topic and explain it from a malware analysis perspective and reverse engineering workflow.

A program execution involves three main components of a computer CPU, RAM (random-access memory, and a.k.a. physical memory), and the hard disk. A program is stored on the hard disk, but for the CPU to execute the code instructions in the program, the OS first loads the program into RAM, thereby creating a process. The CPU picks up the instructions of the program from the RAM and executes these instructions, as illustrated by Figure 4-4.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig4_HTML.jpg — Figure 4-4
CPU executing a process after loading its program into RAM from hard disk

RAM is inexpensive today, but in the earlier days, it was expensive compared to a hard disk. Early computers were not meant for daily use by the common man. At the same time, they ran a limited number of processes compared to now. So, a limited amount of RAM could serve the purpose. But as processing needs evolved, computers were pushed to execute many more complex processes, thus pushing the requirement for a larger capacity RAM. But RAM was very expensive and limited, especially while compared to the hard disk. To the rescue virtual memory.

Virtual memory creates an illusion to a process that there is a huge amount of RAM available exclusively to it, without having to share it with any other processes on the system, as illustrated by Figure 4-5. At the back end of this illusion, the virtual memory algorithm reserves space on the inexpensive hard disk to use it as an extended part of RAM. On Linux, this extended space of the hard disk is called swap space , and on Windows, it’s called a page file .

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig5_HTML.jpg — Figure 4-5
Virtual memory giving an illusion of more memory than what is physically available

Each process can see a fixed amount of memory or rather virtual memory, which is assigned to it by the OS, irrespective of the actual physical size of the RAM. As seen in Figure 4-5, though the system has 1 GB of physical memory, the OS gives the process 4 GB of exclusive virtual memory.

With Windows, for a 32-bit operating system, 4 GB of virtual memory is assigned to each process. It does not matter if the size of RAM is even 512 MB or 1 GB or 2 GB. If there are 10 or 100 processes, each of them is assigned 4 GB of virtual memory, and all of them can execute in parallel without interfering with each other’s memory, as illustrated by Figure 4-6.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig6_HTML.jpg — Figure 4-6
The same fixed amount of virtual memory made available to all processes

In the next section, we go into the details on how virtual memory is implemented in the background and explore concepts like pages that make this whole virtual memory concept possible.

Addressability

Virtual memory, just like physical memory or the RAM, is addressable (i.e., every byte in memory of the process has an address). An address in virtual memory is called a virtual address, and an address in physical memory or RAM is called a physical address. With 4 GB of virtual memory, the address starts at 0 and ends at 4294967295(2^32 – 1). But while dealing with various tools in malware analysis and reverse engineering, the address is represented in hex. With 32 bits used for the address, the first byte is 0x00000000 (i.e., 0), and the last byte is 0xFFFFFFFF (i.e., 4294967295(2**32 – 1)). Physical memory is also similarly addressable, but it is not of much importance since we always deal with virtual memory during the analysis process.

Memory Pages

The OS divides the virtual memory of a process into small contiguous memory chunks called pages. The size of a page is determined by the OS and is based on the processor architecture, but typically, the default page size is 4 KB (i.e., 4096 bytes).

Pages do not only apply to virtual memory but also physical memory. In the case of physical memory, physical memory is also split into these page-sized chunks called frames . Think of frames as buckets provided by physical memory that pages of process virtual memory can occupy.

To understand how a process’s memory translates to pages, let’s take the example of a program, which is made up of both data and instructions, which are present as part of the program PE file on disk. When the OS loads it as a process, the data and instructions from the program are transferred into memory by splitting it into several pages.

For example, let us consider available physical RAM of 20 bytes on the system and a page size used by the OS as 10 bytes. Now let’s assume your process needs 20 bytes of virtual memory to hold all its instructions and data, to which the OS assigns the process and uses two pages in virtual memory. Figure 4-7 illustrates this. As seen in the figure, the process needs and uses 20 bytes of memory. The OS assigns it 20 bytes of virtual memory by splitting it into two pages of 10 bytes each. Another point we also see in the following example is that the virtual memory of the process has a 1-1 mapping with the frames occupied on physical memory by the pages, but this may not always be the case, as you learn next.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig7_HTML.jpg — Figure 4-7
The memory of a process split and stored in pages/frames in virtual/physical memory

Demand Paging

Continuing from where we left off in the previous section, we have 20 bytes of physical RAM available on the system and Process1 using 20 bytes of virtual memory, which in turn ends up taking actual 20 bytes of physical memory as seen in Figure 4-7. Now there is a one-to-one mapping between the amount of virtual memory used by Process1 and the physical memory available on the system. Now, in comes another new process Process2, which now requests 10 bytes of virtual memory. Now the OS can assign one virtual memory page of 10 bytes to this process. But all the physical memory is occupied by frames of Process1. How would this new process run with no free physical memory available on the system? In comes demand paging and the page table.

Demand paging solves the issue with swapping. At any point in time, a process running on the system may not need all its pages to be physically present in the RAM’s frames. These pages, which are currently not needed by the process, are sitting idle in physical memory, wasting costly physical memory. Demand paging targets these idle currently unused pages of processes in physical memory and swaps them out into the physical hard disk, freeing up the frames in physical memory to be used by the pages of other processes that need it. This is illustrated in Figure 4-8, where the unused page in Process1 is swapped out by demand paging from the physical memory to the hard disk, and the active page of Process2 now occupies the vacated frame in physical memory.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig8_HTML.jpg — Figure 4-8
Demand paging allowing pages from multiple processes to use RAM simultaneously

But what happens if Process1 needs Page2 again in physical memory? Since Page2 is not currently mapped in any frame in the RAM, the OS triggers a page fault. This is when the OS swaps out from the RAM another idle page from the same or another process. Page2 of Process1 swaps back into RAM from the hard disk. Generally, if a page is not used in RAM for a long time, it can be swapped out to the hard disk to free the frames up for use by other processes that need it.

Page Table

virtual memory is an abstract memory presented by the OS, and so is the case with a virtual address. But while the CPU runs the instructions of the processes and accesses its data all using their virtual addresses, these virtual addresses need to be converted to actual physical addresses, since the physical address is what the CPU understands and uses. To translate the virtual address of a process into the actual physical address on physical memory, the OS uses a page table.

A Page Table is a table that maps a virtual address into an actual physical address on the RAM. The OS maintains a separate page table for each process running on the system. To illustrate, let’s look at Figure 4-9. We have two processes, each of which has a page table of its own that maps its pages in their virtual memory to frames in physical memory. As the page table for Process1 shows, its PAGE1 is currently loaded in the physical memory at FRAME1, but its PAGE2 entry is shown as INVALID, indicating that it is swapped out to the hard disk. Similarly, the Process2 page table indicates that PAGE2 and PAGE3 are loaded in physical memory at FRAME1 and FRAME3, respectively, while its PAGE1 is swapped out to the hard disk.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig9_HTML.jpg — Figure 4-9
Page table to map pages in virtual memory to frames on physical memory

Division of Virtual Memory Address Space

You already saw that the virtual memory of a process is split into pages. Windows splits the address range of each process’s virtual memory into two areas: the user space and the kernel space. In 32bit Windows offering 4 GB virtual memory per process, the total addressable range is 0x00000000 to 0xFFFFFFFF. This total range is split into user space and kernel space memory by the OS. By default, the range 0x00000000 to 0x7FFFFFFF is assigned to user space and 0x80000000 to 0xFFFFFFFF.

As shown in Figure 4-10, the kernel space is common to all the processes, but the user space is separate for each process. This means that the code or data that lies in the user space is different for each process, but it is common in the kernel space for all processes. Both the user space and kernel space are split into pages.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig10_HTML.jpg — Figure 4-10
Division of virtual memory of a process into User and Kernel Space

Inspecting Pages Using Process Hacker

Resources in virtual memory are split across multiple pages, including code and data. A page can have various properties, including having specific protection (permissions) and type (state). Some of the best tools to visualize pages and view their properties are Process Hacker and Process Explorer. Using Process Hacker, you can view the virtual memory structure of a process, in the Memory tab of the Properties window of a process.

You can now use Sample-4-1 from the same repository, add the .exe extension to it, and create a process out of it by double-clicking Sample-4-1.exe. Opening Process Hacker now shows the process running. Open the Properties window of the process by double-clicking the process Sample-4-1.exe in Process Hacker. You can now click the Memory tab, as we can see in Figure 4-11, which displays the memory layout/structure of the process.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig11_HTML.jpg — Figure 4-11
Visualization of a process’ memory and its various properties by Process Hacker

Process Hacker groups pages of the same type into memory blocks. It also displays the size of a memory block. You can expand the memory block, as shown in Figure 4-12, which shows the grouping of pages into submemory blocks based on protection.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig12_HTML.jpg — Figure 4-12
The Type of a page as shown by Process Hacker

Note

The virtual memory of the process we are looking into using Process Hacker is only the user-space address. Process Hacker does not display the kernel space of the address. We use a Windows 7 32-bit OS. You see a different address space range with a 64-bit OS.

Types of Pages

Various types of data are stored in pages, and as a result, pages can be categorized based on the type of data they store. There are three types of pages: private, image, and mapped. The following briefly describes these page types.

Private pages: These pages are exclusive to the process and are not shared with any other process. For example, the pages holding a process stack, Process Environment Block (PEB), or Thread Environment Block (TEB) are all exclusive to a process and are defined as private pages. Also, pages allocated by calling the VirtualAlloc() API are private and are primarily used by packers and malware to hold their decompressed data and code. Private pages are important for us as you learn to dissect a malware process using its memory in later chapters.
Image pages: These pages contain the modules of the main executable and the DLLs.
Mapped pages: Sometimes, files or parts of files on the disk need to be mapped into virtual memory for use by the process. The pages that contain such data maps are called Mapped. The process can modify the contents of the file by directly modifying the mapped contents in memory. An alternate use of mapped pages is when a part of its memory needs to be shared with other processes on the system.

Using Process Hacker, we can view the types of pages in the Memory tab, as seen in Figure 4-12.

States of a Page

A page in virtual memory—whether mapped, private, or image—may or may not have physical memory allocated for it. The state of a page is what tells if the page has physical memory allocated for it or not. A page can be in any of committed, reserved, or free states. The following list briefly describes these page states.

Reserved: A reserved page has virtual memory allocated in the process but doesn’t have a corresponding physical memory allocated for it.
Committed: A committed page is an extension of reserved pages, but now these also have the physical memory allocated to it.
Free: Free pages are address ranges for pages in virtual memory that are not assigned or made available to the process yet.

Using Process Hacker, you can view the state of pages in the Memory tab, as seen in Figure 4-13.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig13_HTML.jpg — Figure 4-13
The State of a page as shown by Process Hacker

Page Permissions

Pages can contain code as well as data. Some pages contain code that needs to be executed by the CPU. Other pages contain data that the code wants to read. Sometimes the process wants to write some data into the page. Based on the needs of the page, it is granted permissions. Pages can have read, write, and/or execute permissions. The Protection column in the Memory tab in Process Hacker shows the permissions of the pages, as seen in Figure 4-11. Process Hacker uses the letters R, W, and X to indicate if a page has read, write, and execute permissions. The following describes these permissions.

Read: Contents of the page can be read, but you can’t write into this page, nor any instructions can be executed from this page.
Write: The contents of the page can be read, as well as the page can be written into.
Execute: Most likely, the page contains the code/instructions, and they can be executed.

A page that has execute permission does not indicate that it contains only code or instructions that need to be executed. It can contain nonexecutable data, as well.

A page can have a combination of permissions: R, RW, RWX, and RX. The program and the OS decide the page permission of a region in memory. For example, the stack and the heap of a process are meant to store data only and should not contain executable code, and hence the pages for these two should only have permissions RW. But sometimes exploits use the stack to execute malicious code and hence give the stack execute privileges as well, making it RWX. To avoid such attacks, Microsoft introduced Data Execution Prevention (DEP) to ensure that the pages in a stack should not have executable permissions.

Note

The minute OS-related details are needed by malware analysts, reverse engineers, and detection engineers who write malware analysis and detection tools. Often page properties, permissions, and so forth are used in identifying injected and unpacked code in malware scanning and forensic tools, as you will learn in later chapters.

Strings in Virtual Memory

The memory of a process has a lot of data that is consumed by a process during execution. Some of the data in memory are human-readable strings like URLs, domain names, IP addresses, file names, names of tools, and so forth. You can view the data present in the various pages by double-clicking a memory block in Process Hacker’s Memory tab.

You can see the Sample-4-1.exe process in Process Hacker from where you left off in the previous sections and double-click a memory block to view its contents, as illustrated by Figure 4-14. Do note that you can only see the contents of those pages which are in a commit state only. To verify this, you can search for a memory block, which is in a reserved state or that is listed as free, and double-click to watch Process Hacker throw an error describing how you can’t edit the memory block because it is not committed.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig14_HTML.jpg — Figure 4-14
Viewing the contents of a Memory Block using Process Hacker

Figure 4-14 shows the memory block. The first column is the offset of the data from the start address of the memory block. The second column displays the data in hex form, and the third column shows the printable ASCII characters, otherwise known as strings.

But searching for strings that way is cumbersome. Process Hacker provides a shortcut for you to list and view all the strings in the entire virtual memory address space of the process. To do so, you can click the Strings button at the top right of the Memory tab of the Properties window, as seen in Figure 4-14 and seen in Figure 4-15.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig15_HTML.jpg — Figure 4-15
The Strings option in Process Hacker

As seen in Figure 4-15, you have the option to select the type of pages from which it should display the strings, after which it displays all the strings for your selected options, as seen in Figure 4-16.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig16_HTML.jpg — Figure 4-16
The strings displayed by Process Hacker for private and image pages

It also provides a Filter option, as seen in Figure 4-16, using which you can filter and only display strings matching a particular pattern or regular expression. We recommend you to play around with these various options to view the contents of a process’ memory and its strings, as this forms the foundation for a lot of our malware analysis process in our later chapters.

Using Virtual Memory Against Malware

Virtual memory provides extensive information for malware detection. You learn in Chapter 7 that encrypted or packed malware files need to decode themselves into the virtual memory at some point in time in their execution phase. Tapping into the virtual memory of a running malware can get you the decrypted malware code and data without much effort.

The virtual memory now with the decrypted malware code and data can contain important strings related to malware artifacts like malware name, hacker name, target destinations, URLs, IP addresses, and so forth. Many of these artifacts provide an easy way to detect and classify malware, as you will learn in the chapters on dynamic analysis and malware classification.

Many times, malware does not execute completely due to unsuitable environments, or the malware has suspected that it is being analyzed, or other reasons. In that case, strings can be helpful sometimes to conclude that the sample is malware, without having to spend time on reverse-engineering the sample.

One can also identify if code has been unpacked (more on this in Chapter 7) or injected (more on this in Chapter 10) by malware using the permissions/protections of memory blocks. Usually, injecting code malware allocates executable memory using various APIs that end up being allocated as private pages with read, write, and execute (RWX) protection, which is a strong indicator of code injection or unpacking.

So far, we have looked at a process and its properties in its virtual memory. In the next section, let’s go through the PE file format used by executable programs that are the source of these processes, and how they contain various fields and information that helps the OS loader create a process and set up its virtual memory.

Portable Executable File

At the start of this chapter, we showed you a listing for C code, which we compiled to generate an .exe program available as Sample-4-1. Running this program file created a process for it and loaded it into memory, as visible in Process Hacker. But who loaded this program file from the disk into memory, turning it into a process? We explained that it was the OS, but the specific component of the OS that did it is called the Windows loader. But how does the Windows loader know how to load a program as a process, the size of virtual memory it needs, where in the program file the code and the data exist, and where in virtual memory to copy this code and data?

In Chapter 3, you learned that every file has a file format. So does an executable file on Windows, called the PE file format. The PE file format defines various headers that define the structure of the file, its code, its data, and the various resources that it needs. It also contains various fields that inform how much of virtual memory it needs when it is spawned into a process and where in its process’s memory to copy its various code, data, and resources. The PE file format is a huge structure with a large number of fields.

Let us now examine Sample-4-1 from the samples repository. The first step is to determine the file format using TriD (refer to Chapter 3), which shows that it is an executable PE file. Let us now open this file using the hex editor in Notepad++, as seen in Figure 4-17.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig17_HTML.jpg — Figure 4-17
The contents of a PE file as seen in a Notepad++ hex-editor

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig18_HTML.jpg — Figure 4-18
High-level structure of a PE file: its headers and sections

The first two characters that strike our attention are MZ. We have learned in Chapter 3 that these are magic bytes that identify it as an executable Windows file. MZ refers to Mark Zbikowski, who introduced MS-DOS executable file format. A Windows executable can also run on DOS. This Windows executable is called a portable EXE, or PE file. PE files can further be subgrouped as .exe, .dll, and .sys files, but we need to look deeper into the PE file contents and headers to determine this subgroup detail out.

But digging deeper into the PE file to figure out its various details in a simple hex editor is a tedious task. There are tools available that parse PE files and display their inner headers and structure. In the next few sections, we go through the PE file format using tools like CFF Explorer and look at the various fields that we encounter in the malware analysis and reverse engineering process.

Exploring Windows Executable

The PE file has two components: the headers and the sections. The headers are meant to store meta information, and the sections are meant to store the code, data, and the resources needed by the code to execute. Some of the meta-information stored by the headers include date, time, version of the PE file. The headers also contain pointers/offsets into the sections where the code and the data are located.

To dissect the contents of the PE file, we use the tool CFF Explorer (see Chapter 2). There are alternate tools as well, PEView and StudPE being the most popular ones. You can use a tool that you feel comfortable with. In this book, we use CFF Explorer.

We use Sample-4-1 from the samples repository for the exercise here, which you now open using CFF Explorer, as shown in Figure 4-19.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig19_HTML.jpg — Figure 4-19
*Sample-4-1* PE file opened using CFF Explorer

Note

Opening an executable program(PE file) in CFF Explorer does not create a process for the sample program. It is only reading the contents of the PE file and displaying to us its structure and contents

CFF Explorer is user-friendly and self-explanatory. The PE file has several headers and subheaders. Headers have several fields in them which either contain the data itself or an address/offset of some data present in some other header field or section. The left side of Figure 4-19 displays the headers in a tree view; that is, you can see Dos Header and then Nt Headers, which has a subtree with two other headers: File Header and Optional Header, and so on. If you click any of the headers on the left side, you can see the corresponding fields and their values under that header, shown on the right in Figure 4-20.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig20_HTML.jpg — Figure 4-20
Dos Header fields of *Sample-4-1* PE file shown using CFF Explorer

Figure 4-20 shows the DOS Header of Sample-4-1. Please do be mindful of the fact that we have trimmed the figure to show the partial output. CFF Explorer displays the information about the fields in DOS Header on the right-hand side and lists the various fields in a tabular view. Note that all the numerical values are in hex. Here is a list of some of its important columns and their purpose.

Member displays the name of the field. In Figure 4-20, the name of the first field e_magic, which holds as value the magic bytes that identifies the PE file format. The e_magic field is the same field that holds the MZ magic bytes at the start of the file, as shown in Figure 4-17 and Chapter 3.
Offset states the distance in the number of bytes from the start of the file. The e_magic field holds the value MZ, which are the first two bytes of the file (i.e., located at the very beginning of the file). Hence, it holds an offset value of 0(00000000).
Size tells the size of the field’s value (in the next column). The e_magic field is shown to have a size of a word, which is 2 bytes.
Value contains the value of the field. Value can contain the data itself, or it can contain the offset to the location in the virtual memory (we explain this in the “Relative Virtual Address” section), which contains the actual data. The value can be numerical or string. Numerical data can be an offset, size, or representation of some data. An example of a string is the value of e_magic, which is 5A4D. This is the equivalent of the ASCII string ZM but with the order reversed. (In the next section, we talk about why CFF Explorer displays it as 5A4D (i.e., ZM) instead of 4D5A (i.e., MZ).

Endianness

Let us look at the same e_magic field from Figure 4-20, which is the first field of the PE file and holds the first two bytes of the file. The value of this field is listed as 5A4D. But if you open the file using Notepad++ Hex Editor, you see the first two bytes as 4D5A (i.e., the bytes are reversed). Why is CFF Explorer showing it in the reverse order? It is because of a concept called endian , which is a way to store data in computer systems. The data can be stored in little-endian or big-endian format. In a PE file targeted to run on Windows, the field values are stored in little-endian format. In little-endian, the least significant byte of a field has the lowest address. In big-endian, the most-significant byte of a field occupies the lowest address.

The PE file format in Windows follows the little-endian scheme for storing various values in its fields. The value of e_magic field is shown as 5A4D (ZM), but the actual bytes in the file is 4D5A (MZ), where the value 4D has the lower address/offset in the file (i.e., offset 0) and the value 5A is at offset 1, as we can see in Figure 4-17. But CFF Explorer parses this value in little-endian format, which swaps the order while displaying it to us.

Endianness is a well-documented topic in computer science. You can find many resources that describe it in more detail. We recommend that you thoroughly understand how endianness works, and as an exercise, play around with some of the other header fields in CFF Explorer and Notepad++ Hex Editor to see how the data is represented in the file in comparison to how it’s displayed to you.

Image Base

When the Windows loader creates a process, it copies and loads a PE file and its sections from the disk into the process’s virtual memory. But first, it needs to allocate space in virtual memory. But how does it know at what location should it allocate space in virtual memory to copy the PE file and its sections? It comes from the ImageBase field in the PE file under Optional Header, as seen in Figure 4-21.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig21_HTML.jpg — Figure 4-21
ImageBase field under Optional Header of a PE file

You can parallelly run Sample-4-1.exe from the samples repository by adding the .exe extension to it and double-clicking it, just like you did in the previous sections. You can now go to the Memory tab for this process in Process Hacker and locate the memory range/blocks at which Sample-4-1.exe PE file is loaded into virtual memory. You can easily locate this range in Process Hacker because it displays the memory blocks to load PE files by name, as seen in Figure 4-22.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig22_HTML.jpg — Figure 4-22
Locating the memory block and image base of *Sample-4-1.exe* PE file in memory

As seen in Figure 4-22, you can observe the 0x400000 starting address of the memory block for the Sample-4-1.exe PE file in virtual memory, matches the same values in the ImageBase field for the same file in Figure 4-21.

The Catch

There is a catch to what we explained in the previous section. The Windows loader uses the value of the virtual address in the ImageBase field as a recommendation for the starting address at which it should allocate space to load the PE file. But why is this a recommendation? Why can’t the Windows loader always allocate memory starting at this address?

If the memory in the process’s virtual address is already occupied by other contents, then the loader can’t use it to load the PE file. It can’t relocate the existing data into another address location and put the PE file at the image base location.

Instead, it finds another empty chunk of memory blocks, allocates space there, and copies the PE file and its content into it, resulting in a different image base for the PE file’s contents.

Relative Virtual Address (RVA)

We just learned that the PE file is loaded into the virtual memory at the image base. The PE file contains various fields and data that aim to point at various other data and fields at various addresses within its virtual memory. This makes sense if the actual loaded image base address of the process in virtual memory is the same as the image base that the process PE file recommends in its Optional Header ImageBase field. Knowing that the image base in virtual memory is now fixed, various fields can reference and use the addresses it needs with respect to this fixed image base. For example, if the image base in the header is 0x400000, a field in the headers can point to an address in its virtual memory by directly using an address like 0x400020, and so on.

But then we learned of the catch in the previous section. The ImageBase value in the Optional Header is a recommendation. Though it holds a value of 0x400000, the loader might load the entire file, starting at an image base address of 0x500000. This breaks all those fields in the PE file that directly use an absolute address like 0x400020. How is this problem solved? To the rescue relative virtual address (RVA).

With RVA, every reference to an address in virtual memory is an offset from the start of the actual image base address that the process is loaded in its virtual memory. For example, if the loader loads the PE file at virtual memory address starting at 0x500000, and a field/value in the PE file intends to reference data at address 0x500020, it achieves this by using 0x20 as the value of the field in the PE file, which is the offset from the actual image base. To figure out the true address, all the processes and the loader must do is add this offset 0x20 to the actual image base 0x500000 to get the real address 0x500020.

Let’s see RVA in action. You can open Samples-4-1 using CFF Explorer as in the previous sections. As seen in Figure 4-23, the field AddressOfEntryPoint under Optional Header is meant to hold the address of the first code instruction the CPU executes in the process. But as you note, the address is not a full absolute address like 0x401040. Instead, it is an RVA, 0x1040, which means its real address in virtual memory is actual image base + 0x1040. Assuming the actual image base is 0x400000, the effective AddressOfEntryPoint is 0x401040.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig23_HTML.jpg — Figure 4-23
The RVA value held in AddressOfEntryPoint field under Optional Header

To verify things, let’s make OllyDbg debugger start the Samples-4-1 process. OllyDbg is a debugger that loads a program into memory, thereby creating a process and then wait till it breaks/stops at the first instruction the CPU executes in the process. To do this, open OllyDbg and point it to the Samples-4-1 file on disk and then let it stop/break. As you can see in Figure 4-24, OllyDbg stops at an instruction whose address is 0x401040, which is 0x400000 + 0x1040.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig24_HTML.jpg — Figure 4-24
OllyDbg breaks/stop at the first instruction *Sample-4-1.exe* executes

You can verify that 0x400000 is the actual image base of the PE file module in the virtual memory of Sample-4-1.exe by using Process Hacker, as seen in Figure 4-25.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig25_HTML.jpg — Figure 4-25
The actual image base of *Sample-4-1.exe* PE file in its memory

Important PE Headers and Fields

There are three main headers in the PE file: DOS headers, NT headers, and section headers. These headers may have subheaders under them. All the headers have multiple fields in them that describe various properties. Let’s now go through the various header fields defined by the PE file format and understand their properties and the type of value held in them. To investigate all the various fields in this section, we use Samples-4-1 from the samples repository. You can load this sample into CFF Explorer and investigate the various fields as we run them in the following sections.

DOS Header

The DOS header starts with the e_magic field, which contains the DOS signature or magic bytes 4D5A, otherwise known as MZ. If you scroll down the list of fields, you find the e_lfanew field, which is the offset from the start of the file to the start of the PE header.

NT Headers/PE Header

A PE header is also called an NT header or a COFF header and is displayed as an NT header in CFF Explorer. The NT header is further split into the file header and optional header.

Signature

The NT headers begin with the Signature field, which holds the value PE (the hex is 0x5045), as seen in Figure 4-26. Since the PE file uses the little-endian format to hold data, CFF Explorer reverses the order of the bytes and displays the value as 0x4550.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig26_HTML.jpg — Figure 4-26
The Nt Headers Signature field for *Sample-4-1.exe* shown by CFF Explorer

File Header

File Header has seven fields, but the fields that we investigate are Machine, NumberOfSections, and Characteristics.

Machine

The CPU or the processor is the main component of a computer that executes instructions. Based on the needs of the device, various types of processors have been developed, each with its own features and instruction format/set that they understand. Some of the popular ones that are available today are x86 (Intel i386), x64 (AMD64), ARM, and MIPS.

The Machine field holds the value that indicates which processor type this PE file is meant to run on. For Samples-4-1, it holds the value 0x014C, which indicates the Intel i386 processor type. If you click the Meaning value, you can see the various processor/machine types available and modify it, as seen in Figure 4-27.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig27_HTML.jpg — Figure 4-27
The processor/machine type that the PE file should run on

Modifying it to a wrong type results in a failure to create a process when you double-click it. As an exercise, you can try this out by setting a different type (like ARM) and save the file and then try to execute the sample program.

NumberOfSections

The NumberOfSections field holds the number of sections present in a PE file. Sections store various kinds of data and information in an executable, including the code and the data. Sometimes viruses, file infectors, and packers (see Chapter 7) modify clean programs by adding new sections with malicious code and data. When they do this, they also need to manipulate this field to reflect the newly added sections.

Characteristics

The Characteristics field holds a 2-byte bit field value, which represents some properties of the PE file. CFF Explorer displays a human-readable version of the properties when you click Click here in the Meaning column of this field, as seen in Figure 4-28.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig28_HTML.jpg — Figure 4-28
The Characteristics field visualization provided by CFF Explorer

As you can see in Figure 4-28, you can change the properties of the PE file by selecting/deselecting the various checkboxes. This field describes many important properties. The following are some of the important ones.

File is executable: Indicates that the file is a PE executable file
File is a DLL: File is a dynamic link library (we talk about this later)
32-bit word machine: States if the PE file is a 32-bit or 64-bit executable file

Optional Header

An optional header is not optional, and it is important. The Windows loader refers to the many fields in this header to copy and map the PE file into the process’s memory. The two most important fields are AddressOfEntryPoint and ImageBase.

Data Directories

Data directories contain the size and RVA to locations in memory that contain important data/tables/directories, as seen in Figure 4-29. Some of these tables contain information that is used by the loader while loading the PE file in memory. Some other tables contain information that is used and referenced by the code instructions as they are executing.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig29_HTML.jpg — Figure 4-29
The Data Directories in a PE file which hold the RVA and Size of the directory

There is a total of 16 entries under Data Directories. If a table doesn’t exist in memory, the RVA and Size fields for that table entry are 0, as seen for Export Directory in Figure 4-29. The actual directories are in one of the sections, and CFF Explorer also indicates the section in which the directory is located, as seen for import directory, which it says is in the section data.

We go through some of these directories in later sections.

Section Data and Section Headers

Section data, or simply sections, contains code, data referenced by the import tables, export tables, and other tables, embedded resources like images, icons, and in case of malware, secondary payloads, and so forth. The RVAs in some of the header fields we saw in the earlier sections point to data in these very same sections.

All the section data is loaded into virtual memory by the loader. The section header contains information on how the section data is laid out on disk in the PE file and in virtual memory, including the size that should be allocated to it, the memory page permissions, and the section names. The loader uses this information from the section headers to allocate the right amount of virtual memory, assign the right memory permissions (check the section page permissions), and copy the contents of the section data into memory.

The section headers contain the following fields, as shown by CFF Explorer.

Name

The Name field contains the section name. The name of a section is decided by a compiler/linker and packers and any other program that generates these PE files. Sections can have names like .text that usually contains code instructions, .data that usually contains data/variables referenced by the code, and .rsrc that usually contains resources like images, icons, thumbnails, and secondary payloads in case of malware.

But an important point is the names can be misleading. Just because the name says .data, it doesn’t mean it only contains data and no code. It can contain just code or both code and data or anything else for that matter. The names are just suggestions on what it might contain, but it shouldn’t be taken at face value. In fact, for a lot of malware, you may not find sections with names like .text and .data. The packers used by both malware and clean software can use any name of their choice for their sections. You can refer to Table 7-1 in Chapter 7 for the list of section names used by popular packers.

Virtual Address

A virtual address is the RVA in which the section is placed in virtual memory. To get the actual virtual address, we add it to the actual image base of the PE file in virtual memory.

Raw Size

A raw size is the size of the section data in the PE file on the disk.

Raw Address

A raw address is an offset from the start of the PE file to the location where the section data is located.

Characteristics

Sections can have many characteristics or rather properties. In CFF Explorer, if you right-click a section header row and select the Change Section Flags option, it shows the characteristics of the section in human-readable form, as seen in Figure 4-30 and Figure 4-31.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig30_HTML.jpg — Figure 4-30
Right-click a section header row in CFF Explorer to see section characteristics

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig31_HTML.jpg — Figure 4-31
The section characteristics in human-readable form displayed by CFF tool

One of the most important characteristics of a section is its permissions. But what are permissions? Pages in virtual memory have permissions. The permissions for the pages in memory that contain the loaded section data are obtained and set by the Windows loader from the permissions specified in the Characteristics field in the PE file on disk, as seen in Figure 4-31. As you can see, the section permissions in the PE file are specified as Is Executable, Is Readable, or Is Writeable permissions used by pages in virtual memory, as shown in Figure 4-11, Figure 4-12, and Figure 4-13.

The section data can be viewed if you select and click any of the section rows in the section header, as seen in Figure 4-32. This is the data that is copied by the Windows loader from the PE file’s contents into the process’s virtual memory.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig32_HTML.jpg — Figure 4-32
The section data displayed by CFF Explorer

Windows Loader: Section Data—Virtual Memory

The Windows loader reads the data in the section from the disk file, as seen in Figure 4-32, using its Raw Address and Raw Size fields and then copies it into virtual memory. The section data in the file on the disk is at offset 0x200 from the start of the PE file and is 0x200 bytes in size. But at what address in memory does the Windows loader copy the section data into and how much size should it allocate in virtual memory in the first place?

You might think the second answer has an easy answer. The loader just needs to allocate raw size bytes in virtual memory because that’s how much of the section data is present on disk in the file. No! The size it needs to allocate for the section is given by the Virtual Size field, as seen in Figure 4-32. But where in memory should it allocate this space? It allocates it at the address suggested by the Virtual Address field, as seen in Figure 4-32, which is an RVA. It means the actual address at which it allocates this memory is image base + virtual address.

Let’s verify this. From Figure 4-32, you know the RVA at which the .text section is loaded is 0x1000, meaning the actual virtual address is image base + 0x1000. You can run Sample-4-1.exe as you have done in previous sections, as seen in Figure 4-33. As you can see, the actual image base is 0x400000. Go to 0x401000 and check the contents of this location. As you can see in Figure 4-33 and Figure 4-32, the section data is the same, indicating the loader loaded the section data at this location in the virtual memory, as suggested by the various fields in the section header.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig33_HTML.jpg — Figure 4-33
Section data loaded into virtual memory

Dynamic-Link Library (DLL)

Take the example of a sample C program Program1 on the left side of Figure 4-34, which has a main() function that relies on the HelperFunction()function . Take another sample C program, Program2, as seen on the right in Figure 4-34. It also relies on HelperFunction() , which is a replica from Program1.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig34_HTML.jpg — Figure 4-34
2 C Programs that each have a HelperFunction() defined that are exact replicas

What you have is the same function HelperFunction() defined and used by both programs that looks the same. Why the duplication? Can’t we share this HelperFunction() between both programs?

This is exactly where DLLs come in. DLLs, or dynamic-link libraries, hold these functions, more commonly called APIsc (application programming interface), that can be shared and used by other programs, as shown in Figure 4-35.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig35_HTML.jpg — Figure 4-35
DLLs now hold the common code that can be shared by all programs

A DLL is available as a file on Windows with the .dll extension . A DLL file also uses the PE file format to describe its structure and content, just like the Windows executable file with the .exe extension . Similar to the EXE file, DLL files also holds executable code and instructions. But if you double-click an EXE file, it launches as a process. But if you double-click a DLL file, it won’t launch a process. This is because a DLL file can’t be used independently and can only be used in combination with another EXE file. A DLL file is a dependency of another EXE file. Without another EXE file using it, you can’t make use of any APIs it defines.

One of the easiest ways to identify a file as a DLL is by using the file identification tools like TriD and the file command like tools. Another way is by using the Characteristics field in the PE file format. If the file is a DLL, the Characteristics field holds a value that indicates this property, and CFF Explorer shows this. As an exercise, you can load the DLL file Samples-4-2 using CFF Explorer, also illustrated by Figure 4-36, where CFF Explorer shows that it is a DLL with the File is a DLL checkbox.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig36_HTML.jpg — Figure 4-36
The characteristics of *Sample-4-2* shows that it is a DLL

Dependencies and Import Tables

A DLL is just another PE file as we just learned, and just like an executable PE file, a DLL is loaded into memory. Now we also learned that an executable file depends on DLLs for their APIs. When the Windows loader loads an executable PE file, it loads all its DLL dependencies into memory first. The loader obtains the list of DLL dependencies for a PE file from the import directory (also called an import table). As an exercise, open the import directory for Samples-4-1. We go into detail about the import directory in a short while, but it lists that Samples-4-1 depends on msvcrt.dll, as seen in Figure 4-37.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig37_HTML.jpg — Figure 4-37
The DLL dependency of *Sample-4-1* as listed by the import directory in CFF tool

You can now run Sample-4-1.exe, and then using Process Hacker, open the Modules tab for this process. A module is any PE file in memory. As you can see in Figure 4-38, msvcrt.dll is present in the list of modules, indicating that this DLL has been loaded into memory.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig38_HTML.jpg — Figure 4-38
The DLL dependency msvcrt.dll loaded into a memory of *Sample-4-1.exe* by the loader

We can reconfirm that msvcrt.dll is indeed loaded into memory by going to the Memory tab and searching for the memory blocks that hold its PE file, as seen in Figure 4-39.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig39_HTML.jpg — Figure 4-39
The memory blocks holding msvcrt.dll DLL dependency msvcrt.dll of *Sample-4-1*

Dependency Chaining

One of the things you might have noticed in the Modules tab of Figure 4-38 is that a lot of modules/DLLs are loaded into Sample-4-1.exe by the loader. But the import directory in Figure 4-37 for this sample lists that the only dependency is msvcrt.dll. Why is the loader loading all these extra DLLs? It is due to dependency chaining . Sample-4-1.exe depends on msvcrt.dll. But msvcrt.dll being just another PE file also depends on other DLLs. Those DLLs depend on other DLLs, all of which now form a chain, and all the DLLs in this chain are loaded by the Windows loader. Figure 4-40 shows the DLL dependencies of msvcrt.dll.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig40_HTML.jpg — Figure 4-40
DLL dependencies of msvcrt.dll as seen in its import directory

To view the dependency chain of a PE file, you can use the Dependency Walker option in CFF Explorer, as seen in Figure 4-41, which shows the same for Samples-4-2.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig41_HTML.jpg — Figure 4-41
Dependency Walker in CFF tool showing the DLL dependencies of *Sample-4-2*

Exports

A DLL consists of APIs that can be used by other executable programs. But how do you obtain the list of API names made available by a DLL? For this purpose, the DLL uses the Export Directory. As an exercise, you can open the DLL file Sample-4-2 in CFF Explorer and open its Export Directory. As seen in Figure 4-42, Sample-4-2 exports two APIs/functions: HelperFunction1() and HelperFunction2(), as seen in Figure 4-42 .

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig42_HTML.jpg — Figure 4-42
The export directory of a DLL consists the list of APIs made available by the DLL

The Export Directory also holds the RVA of the exported functions, which are listed as 0x1070 and 0x1090. The actual absolute virtual address of these functions in memory is the image base of this DLL file + RVA. For example, if the image base of this DLL is 0x800000, the address of these functions is memory is 0x801070 and 0x801090.

Import Address Table

We learned in the section on DLLs that a PE file depends on APIs from other DLLs it depends on. The code/instructions of a PE file wants to reference and call the APIs in these DLLs. But how does it obtain the address of these APIs in memory? The problem is solved by what is called an IAT (import address table). An IAT is a table or an array in memory that holds the addresses of all the APIs that are used by a PE file. This is illustrated in Figure 4-43.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig43_HTML.jpg — Figure 4-43
The IAT table referenced by code to resolve the address of exported APIs

Let’s now use exercise Sample-4-3 and Sample-4-2 from the sample repository. Add the .exe extension to Sample-4-3 and .dll extension to Sample-4-2. Open Sample-4-2.dll in CFF Explorer to observe that it exports two APIs HelperFuntion1() and HelperFunction2(), as seen in Figure 4-44.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig44_HTML.jpg — Figure 4-44
The exported APIs from *Sample-4-2.dll* as shown by CFF Explorer

Observing the import directory for Sample-4-3.exe in CFF shows that it imports only API HelperFunction2() from Sample-4-2.dll, as seen in Figure 4-45.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig45_HTML.jpg — Figure 4-45
The APIs imported by *Sample-4-3.exe* from *Sample-4-2.dll*

Now run Sample-4-3.exe. The loader loads and runs Sample-4-3.exe into memory, but it also loads Sample-4-2.dll into Sample-4-3.exe’s memory since it is a dependency. According to Figure 4-44, the address of HelperFunction02() in memory is the image base + RVA of 0x1090. As seen in Figure 4-46, the actual image base of Sample-4-2.dll in memory is 0x10000000, making the effective virtual address of HelperFunction2() as 0x10001090.

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig46_HTML.jpg — Figure 4-46
The image base of *Sample-4-2.dll* in memory

../images/491809_1_En_4_Chapter/491809_1_En_4_Fig47_HTML.jpg — Figure 4-47
The IAT of *Sample-4-3.exe* holds the address of HelperFunction02() API

Let’s switch back to Figure 4-45. The IAT for Sample-4-3.exe that holds the addresses of the APIs that it imports from Sample-4-2.dll is located at the RVA of 0xB0F4, which when you combine with its image base of 0x400000 from Figure 4-46, gives it an effective address of 0x40B0F4. Checking the contents of this address in the memory using Process Hacker shows us that it does hold the address of the HelperFunction02() API from Sample-4-2.dll (i.e., 0x10001090), as seen in Figure 4-46.

Why is learning about IAT important? IAT is commonly misused by malware to hijack API calls made by clean software. Malware does this by replacing the address of genuine APIs in the IAT table of a process with addresses of its code, basically redirecting all the API calls made by the process, to its own malicious code. You learn more about this in Chapter 10 and Chapter 11, where we cover API hooking and rootkits.

Summary

Windows Internals is a vast topic that can’t be covered in a few chapters. You have dedicated books covering this topic, including the well-known Windows Internals series by Mark E. Russinovich. We have covered various OS internals topics in the book with relevance to malware analysis, reverse engineering, and detection engineering. In this chapter, we covered how the Windows loader takes a program from the disk and converts it into a process. We explored tools like Process Hacker and Process Explorer, using which we dissect the various process properties. We learned about virtual memory and how it works internally, covering concepts like paging, page tables, and demand paging.

We also covered the PE file format and its various fields and how the loader uses its fields to map it into virtual memory and execute it. We also covered DLLs that are widely used on Windows for implementing APIs and used by malware authors as a carrier of maliciousness. We covered import tables, export tables, and IAT that links an executable PE file and DLLs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Virtual Memory and the Portable Executable (PE) File

Create new playlist

Sign In

Sign Up

4. Virtual Memory and the Portable Executable (PE) File

Process Creation

Executing the Program

Exploring the Process with Process Hacker

Virtual Memory

Addressability

Memory Pages

Demand Paging

Page Table

Division of Virtual Memory Address Space

Inspecting Pages Using Process Hacker

Types of Pages

States of a Page

Page Permissions

Strings in Virtual Memory

Using Virtual Memory Against Malware

Portable Executable File

Exploring Windows Executable

Endianness

Image Base

The Catch

Relative Virtual Address (RVA)

Important PE Headers and Fields

DOS Header

NT Headers/PE Header

Signature

File Header

Machine

NumberOfSections

Characteristics

Optional Header

Data Directories

Section Data and Section Headers

Name

Virtual Address

Raw Size

Raw Address

Characteristics

Windows Loader: Section Data—Virtual Memory

Dynamic-Link Library (DLL)

Dependencies and Import Tables

Dependency Chaining

Exports

Import Address Table

Summary

Table of Contents for
4. Virtual Memory and the Portable Executable (PE) File