Earlier, in Chapter 2, Static Analysis – Techniques and Tooling, we covered some of the more basic aspects of the static analysis of binaries and files that may be malware and defined static analysis – the act of obtaining file metadata and intelligence without actually executing the file.
In this chapter, you'll have the opportunity to test your advanced knowledge of static analysis in order to determine the characteristics of an unknown, custom piece of malware.
In this chapter, we'll examine the following topics:
To follow along with the chapter, you'll need:
In Microsoft Windows, binary files utilize a structured format – the Portable Executable (PE) file format. This format is utilized by the following types of files; though the way the OS interprets and utilizes them is different, they share the same general structure:
While this list is not exhaustive of all files that utilize the PE file format, for the purposes of this conversation, they are the most common. That is to say that these file formats are the ones most consistently utilized by malicious threat actors.
Analysis tip
Adversaries utilize various different forms of the PE file format, as the end result is usually the same – malicious code execution. However, their choice of DLL, SCR, or EXE will affect their TTPs – for instance, a DLL must be executed via RunDLL32.exe or via RegSvr32.exe, whereas an EXE can be executed directly.
Now that we've become familiar with the file types that may utilize the PE format, we can take a deeper dive into understanding the format itself, and understanding how it may be useful to malware analysts such as ourselves.
The first section of a PE file is the DOS header. The DOS header is a leftover element, required for backward compatibility since the inception of the format.
Utilizing CFF Explorer in our VM, we can examine the sections that are relevant to us within the DOS header:
Figure 5.1 – The DOS header for our sample
Only two sections are relevant to us within the DOS header, the e_magic section and e_ifanew. The first section, e_magic, contains the magic number for the executable. In all instances, a portable executable will start with MZ, or the hexadecimal equivalent of 5A4D. Historically, this stands for Mark Zbikowski, the developer of the PE file format. Knowing that every PE file will start with MZ assists us in being able to quickly identify a PE file in hexadecimal editors or via its header.
Analysis tip
Being able to identify the beginning of a PE file by hexadecimal or the signature MZ ... ! This Program cannot Be Run in DOS Mode can be a very useful tool for identifying PEs at a glance that have been loaded into memory, as all PE files will begin with this. Unfortunately, PEs do not have a trailer, so carving them out of blocks of memory can be challenging.
The e_ifanew section is the offset of the PE header. When Windows attempts to load the executable, it will go to this offset from the beginning of the portable executable in memory in order to begin execution. In this case, our PE header is located at +00000080 from the base address of the executable within memory. To clarify this, if our executable were loaded at the 0x00000020 base address, the PE header would be at 0x000000A0.
Figure 5.2 – The DOS stub in ASCII
Between the DOS header and the PE file header, the DOS stub exists, which usually says something such as This program cannot be run in DOS mode. This is directly before the offset of the PE file header. Again, this is a fragment of backward compatibility, and present in every PE.
The next section to examine is the PE file header, at the offset previously mentioned in the DOS header in the e_ifanew section:
Figure 5.3 – The PE file header
Examining the PE header, there are three sections of use to us. Let's take a look at each of the three fields and the information they may offer about the binary we are examining:
Figure 5.4 – The Characteristics pane in CFF Explorer
Additionally, we can see whether the file is a .DLL or a .SYS file by flags in this section.
The optional header contains most of the interesting file metadata in a portable executable:
Figure 5.5 – The optional header offers a trove of information about the binary
In Figure 5.5, I've highlighted the most important fields in the optional header for static analysis:
Figure 5.6 – DLL characteristics advertised by the PE
This section can reveal critical information about a DLL's capabilities, including whether it can move within memory and whether it is aware of whether it is running on a Terminal Services session or server.
The PE file format has several sections but we have only listed a few important ones, usually following a nomenclature similar to the following:
Some of the sections described can be seen in the following screenshot:
Figure 5.7 – The sections table within the PE
Sections outside of the normal defined sections within a PE may be suspect and require further investigation. In this case, we have a non-standard section – r2. Non-standard sections often indicate the usage of a packer to obfuscate code. Additionally, if the virtual size and raw size of a section differ significantly, it may indicate the use of a packer.
The IAT within a binary is incredibly important to understand the functionality and capabilities that malware has been endowed with by its creator. In CFF Explorer, we can navigate to the Import Directory section to view the DLLs loaded by this malware:
Figure 5.8 – The imported libraries and the number of functions used from each in the binary
For instance, we can see that this binary imports the following DLLs from Windows:
Functions within DLLs allow both legitimate and malicious software authors to utilize pre-coded functions, which helps save time – as they do not have to code this functionality directly into their application and can utilize the built-in system functions from these DLLs. Selecting one of the imported link libraries will allow us to view the functions it imports from the libraries:
Figure 5.9 – The location of the functions within the IAT and their names
In the preceding table, we can see that the malware imports several functions from advapi32.dll, their locations in the IAT, as well as their name. Searching for these API references on Microsoft's developer documentation site, https://docs.microsoft.com/en-us/windows/win32/api/, will often reveal incredibly useful information about the functionality of the malware.
In this instance, let's take a look at GetTokenInformation:
Figure 5.10 – Microsoft documentation provides excellent information on API calls
Microsoft has provided us with a succinctly worded description – this function will determine information about a security access token, and return a Boolean value based on whether the call succeeds – possibly utilized to determine the level of permission the malware has when it is running. This can be repeated for each API call or suspicious API calls within the sample itself.
There are several suspicious API calls, all of which can be utilized in legitimate ways, but some to look out for are as follows:
This is not an exhaustive list of suspicious API calls, but malware will often utilize one or several of these to achieve their nefarious purposes on the system – be it process injection, key logging, exfiltrating information, or downloading and executing secondary stages.
However, in some instances, it will not be immediately clear what API calls a binary may utilize, specifically if a packer is utilized. In cases such as this, a packed binary may only call one or two APIs. Let's take a look at how to identify packers and unpack binaries so we may examine them further.
Packing is one of the most common techniques adversaries utilize to attempt to obfuscate their executables. Both commercially available packers and custom packers exist, but both serve the same functionality – to both reduce the size of the executable and render the data within the binary unreadable before unpacking.
Packers work by compressing and encrypting data into single or multiple packed sections, along with a decompression or decryption stub that will decrypt and decompress the actual executable code before the machine attempts to decode it. As a result of this, the entry point of the program moves from the original .text section to the base address of the decompression stub.
In the next few sections, we'll see how we can discover packed samples via several methodologies, and also how we may unpack these samples.
Detecting the usage of a packer is fairly simple, and there are several indicators that tend to be the most successful in identifying packed binaries. Let's review a few of the simplest ways to identify whether a binary has been packed:
Figure 5.11 – Detect It Easy and its graphical representation of Shannon entropy
The Detect It Easy tool has a good entropy portion that will give a visualization of the randomness of each section. The sample in the figure has been packed with UPX.
Figure 5.12 – Section names and sizes differ among packed and non-packed binaries
Additionally, the raw size of the section will be less than the memory that is allocated in the virtual size, suggesting that it will be unpacked into this section, as all binaries must be unpacked by the unpacking stub before the machine is able to execute the code.
Figure 5.13 – Packed binaries often have far fewer imported API calls than unpacked binaries
A packed executable will have far fewer imports than an unpacked binary – only what is necessary to unpack the executable. Reviewing the import directory in combination with other evidence can confirm the presence or utilization of a packer.
In the case of commercially available packers such as UPX, the tool utilized to pack the binary can simply be unpacked by using the tool with the correct command-line switches on the sample in question.
There are also several services, such as https://www.unpac.me, that will unpack malware samples, but again, are public services where your malware sample may become available.
Failing these, we'll cover the manual unpacking of malware samples in greater detail in Chapter 7, Advanced Dynamic Analysis Part 2 – Refusing to Take the Blue Pill.
In the next section, we'll see how NSA's Ghidra reverse-engineering tool can be utilized to perform much of the static analysis work we've done so far with various different tools.
Many of the static analysis techniques we have covered so far can be done within NSA's Ghidra platform as well, for a single-pane-of-glass view. We'll walk through the process of setting up a project in Ghidra, reviewing some of the information we've already looked at, and then diving into some other capabilities within Ghidra.
When we start Ghidra, we'll be on a screen indicating that we have no active project. To begin work, we'll need to define a project, which can be done under the File menu:
Figure 5.14 – Creating a new Ghidra project
Once we've selected this, we'll be asked to name our project. Any name will do, as long as it is meaningful to you:
Figure 5.15 – Naming our project
Once Next is selected, the project is created. Now, to analyze a binary, simply drag and drop it onto Ghidra, which will then import the binary into the project, and ask for a few options. Go with the defaults here:
Figure 5.16 – Importing a PE into Ghidra
Once OK is clicked, double-click your executable to open the code browser for Ghidra. Ghidra will prompt you to analyze the executable. Let's proceed with the analysis:
Figure 5.17 – The Ghidra Analyze prompt
Once the analysis is complete, you will be dropped at the main pane for Ghidra, allowing us to proceed with the analysis of the sample. Immediately, in the left-hand pane, we can see the Symbol Tree.
The Symbol Tree contains all of the imports we've previously identified in CFF Explorer. In the following figure, we can see the DLLs that have been loaded by the application, and clicking the expand button allows us to see the functions that have been imported from the library, as well as the arguments they accept when called:
Figure 5.18 – DLLs and imported functions of the PE within Ghidra
Clicking one of the imported functions will take us to the address in memory where the function resides. Here, we can also see an XREF or cross-reference, where the function is called in another function in the malware. More succinctly, it will take us to where the function is utilized:
Figure 5.19 – Cross-references to an API call within the malware sample
Double-clicking this cross-reference will open the decompiler and will give us pseudo-code of what it appears to be doing with this functionality.
Figure 5.20 – The decompiled view of the API call's cross-referenced function
Here, we can see that a variable is substituted for a hardcoded service name, and following the value, the variable appears to be undefined, suggesting it may require input from the malware author, or via some other methodology. We can also cross-reference the MSDN documentation for these variable names, located at https://docs.microsoft.com/en-us/windows/win32/api/winsvc/ns-winsvc-service_status, to get a better understanding of what we are looking at.
We do, however, know that the malware has the capability to alter built-in Windows services. Utilizing and following API calls in this fashion can help build a better map of the functionality and capabilities of different malware samples.
Figure 5.21 – The Ghidra window menu for Defined Strings
Ghidra is also able to give us defined strings within the program. We can utilize this to review any strings in a GUI fashion, separate from the previously discussed string utility:
Figure 5.22 – References to registry value types within defined strings in Ghidra
Here, we can see references to Reg_SZ and Reg_DWORD, indicating the malware has the ability to set these. Following the cross-references, as we did for the API functions, we can see a function exists within the code that has the ability to delete, modify, and set the values of registry keys:
Figure 5.23 – A function that indicates the malware has the ability to create, delete, and modify values within the registry
Similarly, we can follow the sequential flow of the program by beginning at the entry point (navigate to Functions | Entry in the left pane), and then using the function graph from Window | Function Graph:
Figure 5.24 – The function graph within Ghidra
Doing this will display a window showing the logical progression of the application, and the functions that it calls. Here, we have iterations of functions, including red arrows for the functions that are called if a specified condition is not met, and green arrows for specifying if a condition is met. Double-clicking any of these functions will open the corresponding function in the decompiler for examination.
While reverse-engineering is out of scope for this book, stepping through these functions in this way may give a good idea of the capabilities, functionality, and targeting of non-commodity malware.
Let's move on, and try to test the skills we've learned in this chapter!
Utilizing the unknown.exe sample from the malware sample pack, and without running the application, attempt to answer the following questions utilizing any of the tools we've covered in this chapter – or any tools you're familiar with that provide the same information:
In this chapter, we discussed advanced static analysis techniques. We dove into the PE file format, and all it entails – including sections, magic numbers, DLL imports, and Windows API calls. We also discussed packers, and why adversaries may choose to utilize these to hide the initial intention of their binaries.
While the tools covered in this chapter will get an enterprising analyst most of the static information they need, there are many tools that will also suffice and may provide better or more complete information.
Now that we have a good grasp of static analysis techniques, in the next chapter, we will move on to actually execute our malware and all the fun that comes with it. This will allow us to validate our findings from static analysis.