In this chapter, we are going to cover the core fundamentals that you need to know to analyze 32-bit or 64-bit malware in the Windows platform. We will cover the Windows Portable Executable file header (PE header) and look at how it can help us to answer different incident handling and threat intelligence questions.
We will also walk through the concepts and basics of static and dynamic analysis, including processes and threads, the process creation flow, and WOW64 processes. Finally, we will cover process debugging, including setting breakpoints and altering the program’s execution.
This chapter will help you to perform basic static and dynamic analyses of malware samples by explaining the theory and equipping you with practical knowledge. By doing this, you will learn about the tools needed for malware analysis.
In this chapter, we will cover the following topics:
When you start to perform basic static analysis on a file, your first valuable source of information will be the PE header. The PE header is a structure that any executable Windows file follows.
It contains various information, such as supported systems, the memory layouts of sections that contain code and data (such as strings, images, and so on), and various metadata, helping the system load and execute a file properly.
In this section, we will explore the PE header structure and learn how to analyze a PE file and read its information.
The portable executable structure was able to solve multiple issues that appeared in previous structures, such as MZ for MS-DOS executables. It represents a complete design for any executable file. Some of the features of the PE structure are as follows:
Now, let’s talk about what PE’s structure looks like.
In this section, we will dive deeper into the structure of a typical executable file on a Windows operating system. This structure is used by Microsoft to represent multiple files, such as applications or libraries in the Windows operating system, across multiple types of devices, such as PCs, tablets, and mobile devices.
Early in the MS-DOS era, Windows and DOS co-existed, and both had executable files with the same extension, .exe. So, each Windows application had to start with a small DOS application that printed a message stating This program cannot be run in DOS mode (or any similar message). This way, when a Windows application gets executed in the DOS environment, the small DOS application at the start of it will get executed and print this message to the user to run it in the Windows environment. The following diagram shows the high-level structure of the PE file header, with the DOS program’s MZ Header at the start:
Figure 3.1 – Example PE structure
This DOS header starts with the MZ magic value and ends with a field called e_lfanew, which points to the start of the portable executable header, or PE header.
The PE header starts with two letters, PE, followed by two important headers, which are the file header and the optional header. Later, all the additional structures are pointed to by the data directory array.
Some of the most important values from this header are as follows:
Figure 3.2 – File header explained
The highlighted values are as follows:
Now, let’s talk about the optional header.
Following the file header, the optional header comes with much more information, as shown here:
Figure 3.3 – Optional header explained
Here are some of the most important values in this header:
The optional header ends with a list of data directories.
The data directory array points to a list of other structures that might be included in the executable and are not necessarily present in every application.
It includes 16 entries that follow the following format:
The data directory includes many different values; not all of them are that important for malware analysis. Some of the most important entries to mention are as follows:
Following the data directories, there is a section table.
After the 16 entries of the data directory array, there’s the section table. Each entry in the section table represents a section of the PE file. The number of sections in total is the number stored in the NumberOfSections field in FileHeader.
Here is an example of it:
Figure 3.4 – Example of a section table
These fields are used for the following purposes:
Now, let’s talk about the Rich header.
This is a much lesser-known part of the MZ-PE header. It is located straight after the small DOS program, which prints the This program cannot be run in DOS mode string, and the PE header, as shown in the following screenshot:
Figure 3.5 – Raw Rich header
Unlike other header structures, it is supposed to be read from the end of where the Rich magic value is located. The value following it is the custom checksum that’s calculated over the DOS and Rich headers, which also serves as an XOR key, with which the actual content of this header is encrypted. Once decrypted, it will contain various information about the software that was used to compile the program. The very first field, once decrypted, will be the DanS marker:
Figure 3.6 – Parsed Rich header in the PE-Bear tool
This information can help researchers identify software that was used to create malware to choose the right tools for analysis and actor attribution.
As you can see, the PE structure is a treasure trove for malware analysts since it provides lots of invaluable information about both the malicious functionality and the attackers who created it.
At this point, you may be thinking that all x64 PE files’ fields take 8 bytes compared to 4 bytes in x86 PE files. But the truth is that the PE+ header is very similar to the good old PE header with very few changes, as follows:
Now that we know what the PE header is, let’s talk about various tools that may help us extract and visualize this information.
Once we become familiar with the PE format, we need to become able to parse different PE files (for example, .exe files) and read their header values. Luckily, we don’t have to do this ourselves in a hex editor; there are lots of different tools that can help us read PE header information easily. The most well-known free tools to do it are as follows:
Figure 3.7 – CFF Explorer UI
Figure 3.8 – PEiD UI
In the next section, we will further our knowledge and explore the nitty-gritty of static and dynamic linking.
In this section, we will cover the code libraries that were introduced to speed up the software development process, avoid code duplication, and improve the cooperation between different teams within companies producing software.
These libraries are a known target for malware families as they can easily be injected into the memory of different applications and impersonate them to disguise their malicious activities.
First of all, let’s talk about the different ways libraries can be used.
With the increasing number of applications on different operating systems, developers found that there was a lot of code reuse and the same logic being rewritten over and over again to support certain functionalities in their programs. Because of that, the invention of code libraries came in handy. Let’s take a look at the following diagram:
Figure 3.9 – Static linking from compilation to loading
Code libraries (.lib files) include lots of functions to be copied to your program when required, so there is no need to reinvent the wheel and rewrite these functions again (for example, the code for mathematical operations such as sin or cos for any application that deals with mathematical equations). This is done by a program called a linker, whose job is to put all the required functions (groups of instructions) together and produce a single self-contained executable file as a result. This approach is called static linking.
Statically linked libraries lead to having the same code copied over and over again inside each program that may need it, which, in turn, leads to the loss of hard disk space and increases the size of the executable files.
In modern operating systems such as Windows and Linux, there are hundreds of libraries, and each contains thousands of functions for UIs, graphics, 3D, internet communications, and more. Because of that, static linking appeared to be limited. To mitigate this issue, dynamic linking emerged. The whole process is displayed in the following diagram:
Figure 3.10 – Dynamic linking from compilation to loading
Instead of storing the code inside each executable, any needed library is loaded next to each application in the same virtual memory so that this application can directly call the required functions. These libraries are named dynamic link libraries (DLLs), as shown in the preceding diagram. Let’s cover them in greater detail.
A DLL is a complete PE file that includes all the necessary headers, sections, and, most importantly, the export table.
The export table includes all the functions that this library exports. Not all library functions are exported as some of them are for internal use. However, the functions that are exported can be accessed through their names or ordinal numbers (index numbers). These are called application programming interfaces (APIs).
Windows provides lots of libraries for developers who are creating programs for Windows to access its functionality. Some of these libraries are as follows:
Now, it’s time to talk about what exactly APIs are.
In short, APIs export functions in libraries that any application can call or interact with. In addition, APIs can be exported by executable files in the same way as DLLs. This way, an executable file can be run as a program or loaded as a library by other executables or libraries.
Each program’s import table contains the names of all the required libraries and all the APIs that this program uses. And in each library, the export table contains the API’s name, the API’s ordinal number, and the RVA address of this API.
Important Note
Each API has an ordinal number, but not all APIs have a name.
In malware, it’s very common to obscure the name of the libraries and the APIs that they are using to hide their functionality from static analysis using what’s called dynamic API loading.
Dynamic API loading is supported by Windows using two very well-known APIs:
By calling these two APIs, malware can access APIs that are not written in the import table, which means they might be hidden from the eyes of the reverse engineer.
In some advanced malware, the malware author also hides the names of the libraries and the APIs using encryption or other obfuscation techniques, which will be covered in Chapter 4, Unpacking, Decryption, and Deobfuscation.
These APIs are not the only APIs that can allow dynamic API loading; other techniques will be explored in Chapter 8, Handling Exploits and Shellcode.
Armed with this knowledge, let’s learn more about how to put it into practice.
Now that we’ve covered the PE header, dynamic link libraries, and APIs, the question that arises is, How can we use this information in our static analysis? This depends on the questions that you want to answer, so that is what we will cover here.
If an incident occurs, static analysis of the PE header can help you answer multiple questions in your report. Here are the questions and how the PE header can help you answer them:
The PE header can help you figure out if this malware is packed. Packers tend to change section names from their familiar names (.text, .data, and .rsrc) to something else, such as UPX0 or .aspack.
In addition, packers commonly hide most of the APIs otherwise expected to be present in the import table. So, if you see that the import table contains very few APIs, that could be another sign of packing being involved. We will cover unpacking in detail in Chapter 4, Unpacking, Decryption, and Deobfuscation.
It’s very common to see droppers that have additional PE files stored in their resources. Multiple tools, such as Resource Hacker, can detect these embedded files (or, for example, a ZIP file that contains them), and you will be able to find the dropped modules.
For downloaders, it’s common to see an API named URLDownloadToFile from a DLL named urlmon.dll where you can download the file, and the ShellExecuteA API to execute the file. Other APIs can be used to achieve the same goal, but these two APIs are the most well-known and among the easiest to use for malware authors.
There are many APIs that can tell you that the malware uses the internet, such as socket, send, and recv, and they can tell you if they connect to a server acting as a client or if they listen to a port such as connect or listen, respectively.
Some APIs can even tell you what protocol they are using, such as HTTPSendRequestA or FTPPutFile, which are both from wininet.dll.
Some APIs are related to file searching, such as FindFirstFileA, which could be a hint that this malware may be ransomware or an info stealer.
It could use APIs such as Process32First, Process32Next, and CreateRemoteThread, which could mean a process injection functionality, or use TerminateProcess, which could mean that this malware may try to terminate other applications, such as antivirus programs or malware analysis tools.
We will cover all of these in greater detail later in this book. This section gave you hints and ideas to think about during your next static malware analysis and helped you find what you would be searching for in a PE header.
Usually, it is a good idea to focus on the main questions that you should answer in your report. Perhaps performing basic static analysis based on the strings and the PE header would be enough to help your case.
So far, we have covered how a PE header could help you answer questions related to incident handling or a normal tactical report. Now, let’s cover the following questions related to threat intelligence and how a PE header can help you answer them:
Sometimes, threat researchers need to know how old the sample is. Is it an old sample or a new variant, and when did the attackers start to plan their attacks in the first place?
The PE header includes a value called TimeDateStamp in the file header. This value includes the exact date and time this sample was compiled, which can help answer this question and help threat researchers build their attack timeline. However, it’s worth mentioning that it can also be forged. Another less-known field that serves a similar purpose is the TimeDateStamp value of the Export Directory (when available).
What country do the attackers belong to? That can answer a lot about their motivations.
One of the ways to answer this question is, again, TimeDateStamp, which looks at many samples and their compile times. In some cases, they fall into 9-5 jobs for a particular time zone, which may help deduce the attackers’ country of origin, as shown in the following graph:
Figure 3.11 – Patterns in compilation timestamps
The Rich header may also be used for attribution purposes since combining different versions of software that were used to compile the sample generally doesn’t change very often for a particular setup.
One of the data directory entries is related to the certificate. Some applications are signed by their manufacturer to provide additional trust for the users and the operating system that this application is safe. But these certificates sometimes get stolen and used by different malware actors.
For all the malicious samples that use a specific stolen certificate, it’s quite likely that all of them are produced by the same actor. Even if they have a different purpose or target different victims, they’re likely to be different activities performed by the same attackers.
As we mentioned earlier, a PE header is an information treasure trove if you look into the details hiding inside its fields. Here, we covered some of the most common use cases. There is so much more to get out of it, and it’s up to you to explore it further.
Everything that we have covered so far was related to the PE file present on the hard disk. What we haven’t covered yet is how this PE file changes in memory when it’s loaded, as well as the whole execution process of these files. In this section, we will talk about how Windows loads a PE file, executes it, and turns it into a live program.
To understand PE loading and process creation, we must cover some basic terminology, such as process, thread, Thread Environment Block (TEB), Process Environment Block (PEB), and others before we dive into the flow of loading and executing an executable PE file.
A process is not just a representation of a running program in memory – it is also a container for all the information about the running application. This container stores information about the virtual memory associated with that process, all the loaded DLLs, opened files and sockets, the list of threads running as part of this process (we will cover this later), the process ID, and much more.
A process is a structure in the kernel that holds all this information, working as an entity to represent this running executable file, as shown in the following diagram:
Figure 3.12 – Example of a 32-bit process memory layout
We’ll compare the various aspects of virtual memory and physical memory in the next section.
Virtual memory is like a holder for each process. Each process has its own virtual memory space to store its images, related libraries, and all the auxiliary memory ranges dedicated to the stack, heap, and so on. This virtual memory has a mapper to the equivalent physical memory. Not all virtual memory addresses are mapped to physical memory, and each one that’s been mapped has a permission (READ|WRITE, READ|EXECUTE, or maybe READ|WRITE|EXECUTE), as shown in the following diagram:
Figure 3.13 – Mappings between physical and virtual memory
Virtual memory allows you to create a security layer between one process and another and allows the operating system to manage different processes and suspend one process to give resources to another.
A thread is not just the entity that represents an execution path inside a process (and each process can have one or more threads running simultaneously). It is also a structure in the kernel that saves the whole state of that execution, including the registers, stack information, and the last error.
Each thread in Windows has a little time frame to run in before it gets stopped to have another thread resumed (as the number of processor cores is much smaller than the number of threads running in the entire system). When Windows changes the execution from one thread to another, it takes a snapshot of the whole execution state (registers, stack, instruction pointer, and so on) and saves it in the thread structure to be able to resume it again from where it stopped.
All threads running in one process share the same resources of that process, including the virtual memory, open files, open sockets, DLLs, mutexes, and others, and they synchronize with each other upon accessing these resources.
Each thread has a stack, instruction pointer, code functions for error handling (SEH, which will be covered in Chapter 6, Bypassing Anti-Reverse Engineering Techniques), a thread ID, and a thread information structure called TEB, as shown in the following diagram:
Figure 3.14 – Example processes with one and multiple threads
Next, we will talk about the crucial data structures that are needed to understand threads and processes. Let’s get started.
The last thing that you need to understand related to processes and threads are TIB, TEB, and PEB data structures. These structures are stored inside the process memory, and their main function is to include all the information about the process and each thread, as well as make them accessible to the code so that it can easily know the process filename, the loaded DLLs, and other related information.
They can all be accessed through a special segment register, either FS (32-bit) or GS (64-bit), like this:
mov eax, DWORD PTR FS:[XX]
These data structures have the following functions:
In the next section, and throughout this entire book, we will cover the different information that is stored in these structures that is used to help the malicious code achieve its goals – for example, to detect debuggers.
Now that we know the basic terminology, we can dive into PE loading and process creation. We will investigate it sequentially, as shown in the following steps:
Now, let’s dig deeper into the PE loading part of this process.
The Windows PE loader follows these steps while loading an executable PE file into memory (including dynamic link libraries):
Figure 3.15 – Mapping sections from disk to memory
One more thing we need to learn about is WOW64.
At this point, you should understand how a 32-bit process gets loaded into an x86 environment and how a 64-bit process gets loaded into an x64 environment. So, how about running 32-bit programs in 64-bit operating systems?
For this special case, Windows has created what’s called the WOW64 subsystem. It is implemented mainly in the following DLLs:
These DLLs create a simulated environment for the 32-bit process, which includes 32-bit versions of libraries that it may need.
These DLLs, rather than connecting directly to the Windows kernel, call an API, X86SwitchTo64BitMode, which then switches to x64 and calls the 64-bit ntdll.dll, which communicates directly with the kernel, as shown in the following diagram:
Figure 3.16 – WOW64 architecture
Also, for WOW64-based processes (x86 processes running in an x64 environment), new APIs were introduced, such as IsWow64Process, which is commonly used by malware to identify if it’s running as a 32-bit process in an x64 environment or an x86 environment.
Now that we’ve explained processes, threads, and the execution of the PE files, it’s time to start debugging a running process and understanding its functionality by tracing over its code at runtime.
There are multiple debugging tools we can use. Here, we will just give three examples that are quite similar to each other in terms of their UIs and functionality:
Figure 3.17 – OllyDbg UI
Figure 3.18 – Immunity Debugger UI
Figure 3.19 – x64dbg UI
We will cover OllyDbg 1.10 (the most common version of OllyDbg) in great detail. The same concepts and hotkeys can be applied to other debuggers mentioned here.
The OllyDbg UI interface is pretty simple and easy to learn. In this section, we will cover the steps and the different windows that can help you with your analysis:
Figure 3.20 – OllyDbg attaching dialog window
You’ve also got the registers in the top right-hand corner. It is possible to modify them at any given time (once the execution has been paused). At the bottom, you have the stack and the data in hex format, which you can also modify.
You can simply modify any data in memory in the following two views:
Figure 3.21 – OllyDbg default window layout explained
Figure 3.22 – OllyDbg dialog window for executable modules
This window will help you see all the loaded PE files in this process’ virtual memory, including the malware sample and all the libraries or DLLs loaded with it.
Figure 3.23 – OllyDbg memory map dialog window
The other option will be to just step over. Step over executes one line of code. However, if this line of code is a call to another function, it executes this function completely and stops just after the function returns. This makes it different from Step into, which goes inside the function and stops at the beginning of it, as shown in the following screenshot:
Figure 3.24 – OllyDbg debug menu
It includes the option to set hardware breakpoints and view them, which we will cover later in this chapter.
Now, let’s talk about breakpoints.
To be able to dynamically analyze a sample and understand its behavior, you need to be able to control its execution flow. You need to be able to stop the execution when a condition is met, examine its memory, and alter its registers’ values and instructions. There are several types of breakpoints that make this possible.
This breakpoint is very simple and allows the processor to execute only one instruction of the program, before returning to the debugger.
This breakpoint modifies a flag in a register called EFlags. While not common, this breakpoint could be detected by malware to identify the presence of a debugger, which we will cover when we look at anti-reverse engineering tricks in Chapter 6, Bypassing Anti-Reverse Engineering Techniques.
This is the most common breakpoint, and you can easily set this breakpoint by double-clicking on the hex representation of an assembly line in the CPU window in OllyDbg or pressing F2. After this, you will see a red highlight over the address of this instruction, as shown in the following screenshot:
Figure 3.25 – Disassembly in OllyDbg
Well, this is what you see through the debugger’s UI, but what you don’t see is that the first byte of this instruction (0xB8, in this case) has been modified to 0xCC (the INT3 instruction), which stops the execution once the processor reaches it and returns control to the debugger. This 0xCC byte is not visible in the debugger UI as it keeps showing us the original bytes and the instruction they represent, but it can be seen if we decide to dump this memory on the disk and look at it using the hex editor.
Once the debugger gets control of this INT3 breakpoint, it replaces 0xCC with 0xB8 to execute this instruction normally.
The main problem with this breakpoint is that it modifies memory. If malware tries to read or modify the bytes of this instruction, it will read the first byte as 0xCC instead of 0xB8, which can break some code or detect the presence of the debugger (which we will cover in Chapter 6, Bypassing Anti-Reverse Engineering Techniques). In addition, it may affect memory dumping because this way, the resulting dump will be damaged by these modifications. The solution to this problem is to remove all software breakpoints before dumping memory.
Memory breakpoints are used not to stop the execution of specific instructions, but to stop when any instruction tries to read or modify a specific part of memory. The way many debuggers set memory breakpoints is by adding the PAGE_GUARD (0x100) protection flag to the page’s original protection and removing PAGE_GUARD once the breakpoint is hit.
These can be accessed by right-clicking on Breakpoint | Memory, on access or Memory, on write, as shown in the following screenshot:
Figure 3.26 – OllyDbg breakpoint menu
Another important thing to note here is that memory breakpoints are less precise as it is only possible to change memory protection flags for a memory page, not for a single byte.
Hardware breakpoints are based on six special-purpose registers: DR0-DR3, DR6, and DR7.
These registers allow you to set a maximum of four breakpoints that have been given specific addresses to read, write, or execute 1, 2, or 4 bytes, starting from the given address. They are very useful as they don’t modify the instruction bytes as INT3 breakpoints do, and they are generally harder to detect. However, they could still be detected and removed by the malware, which we will discuss in Chapter 6, Bypassing Anti-Reverse Engineering Techniques.
You can view them from the Debug menu by going to Hardware breakpoints, as shown in the following screenshot:
Figure 3.27 – OllyDbg dialog window for hardware breakpoints
As you can see, each type of breakpoint serves a particular purpose and has advantages and disadvantages, so it is important to know all of them and use them according to the task at hand.
To be able to bypass anti-debugging tricks, forcing the malware to communicate with the C&C or even testing different branches of the malware execution, you need to be able to alter the execution flow of the malware. Let’s look at the different techniques we can use to alter the execution flow and the behavior of any thread.
You can modify the code execution path by changing the assembly instruction. For example, you can change a conditional jump instruction to the opposite condition, as shown in the following screenshot, and force the execution of a specific branch that wasn’t supposed to be executed:
Figure 3.28 – Working with assembly in OllyDbg
Apart from the code, it is also possible to change the content of registers.
Rather than modifying the code of the conditional jump instruction, you can modify the results of the comparison before it by changing the EFlags registers.
At the top right, after the registers, you have multiple flags that you can change. Each flag represents a specific result from any comparison (other instructions change these flags as well). For example, ZF represents if the two values are equal or if a register became zero. By changing the ZF flag, you force conditional jumps, such as jnz and jz, to jump to the opposite branch and force the execution path to change.
You can force the execution of a specific branch or instruction by simply modifying the instruction pointer (EIP/RIP). You can do this by right-clicking on the instruction of interest and choosing New origin here.
Just like you can change an instruction code, you can change the data values. With the bottom-left view (the hexadecimal view), you can change bytes of the data by right-clicking on Binary | Edit. You can also copy/paste hexadecimal values, as shown in the following screenshot:
Figure 3.29 – Data editing in OllyDbg
Now, let’s talk about how to efficiently search for important pieces of information to facilitate the analysis.
When performing reverse engineering, strings and APIs serve as very important sources of information, so it is important to know how to navigate them efficiently.
To get a list of strings in OllyDbg, right-click anywhere in the disassembly section of the CPU window and choose Search for | All referenced text strings. The resulting dialog box will show all candidate C-style strings, both ANSI and Unicode (UTF16-LE), and the instructions that use them.
To get a list of APIs, do the same, but this time, choose Search for | All intermodular calls.
Cross-references are markers that show the researcher where this code or data is being accessed. This is an extremely important piece of information that allows us to efficiently connect the dots. To find them for a particular instruction, right-click on it and choose the Find references to | Selected command option. For data in the hex dump window, it will be just Find references.
When analyzing any kind of sample, it is important to keep the markup accurate so that you will always have a clear picture of what the meaning of already reviewed code or data is. Giving functions and references proper names is a great way to make sure you won’t have to re-analyze the same code again after some time.
To give the function or some data a name, right-click on its first instruction and choose the Label option (or just press the : hotkey). Now, all the references to them will use this label rather than an address, as shown in the following screenshot:
Figure 3.30 – Using labels and comments in OllyDbg
To follow the address, press Enter while selecting the instruction using it. To return, press the - hotkey. To leave comments, use the ; hotkey.
Now, let’s talk about x64dbg.
As we mentioned previously, these debuggers share multiple similarities. They use the same layout and have pretty much the same interface options and hotkeys – even the default color schema is quite similar. However, there is a list of differences between them, some of which are worth mentioning:
There are other minor differences here and there, so feel free to try both tools and choose the one that suits you best.
Now, let’s talk about how to debug services.
While loading individual executables and DLLs for debugging is generally a pretty straightforward task, things get a little bit more complicated when we talk about debugging Windows services.
Services are tasks that are generally supposed to execute certain logic in the background, similar to daemons on Linux. So, it comes as no surprise that malware authors commonly use them to achieve reliable persistence.
Services are controlled by the Service Control Manager (SCM), which is implemented in %SystemRoot%System32services.exe. All services have the corresponding HKLMSYSTEMCurrentControlSetservices<service_name> registry key. It contains multiple values that describe the service, including the following:
Let’s look at several ways the services can be designed:
A user-mode service with a dedicated executable (or a DLL with its own loader) can be registered using the standard sc command-line tool, like this:
sc create <service_name> type= own binpath= <path_to_executable>
The process is slightly more complicated for svchost DLL-based services:
reg add "HKLMSOFTWAREMicrosoftWindows NTCurrentVersionSvchost" /v "<service_group>" /t REG_MULTI_SZ /d "<service_name>