Chapter 5: Advanced Static Analysis – Out of the White Noise

Earlier, in Chapter 2, Static Analysis – Techniques and Tooling, we covered some of the more basic aspects of the static analysis of binaries and files that may be malware and defined static analysis – the act of obtaining file metadata and intelligence without actually executing the file.

In this chapter, you'll have the opportunity to test your advanced knowledge of static analysis in order to determine the characteristics of an unknown, custom piece of malware.

In this chapter, we'll examine the following topics:

  • Dissecting the PE file format
  • Examining packed files and packers
  • Utilizing NSA's Ghidra for static analysis

Technical requirements

To follow along with the chapter, you'll need:

Dissecting the PE file format

In Microsoft Windows, binary files utilize a structured format – the Portable Executable (PE) file format. This format is utilized by the following types of files; though the way the OS interprets and utilizes them is different, they share the same general structure:

  • Control Panel Items (CPL)
  • Dynamic Link Library (DLL)
  • Driver (DRV) files
  • Windows Executable (EXE) applications
  • Multilingual User Interfaces (MUI)
  • Windows Screensaver (SCR) files
  • System (SYS) files
  • Shortcut (LNK) files

While this list is not exhaustive of all files that utilize the PE file format, for the purposes of this conversation, they are the most common. That is to say that these file formats are the ones most consistently utilized by malicious threat actors.

Analysis tip

Adversaries utilize various different forms of the PE file format, as the end result is usually the same – malicious code execution. However, their choice of DLL, SCR, or EXE will affect their TTPs – for instance, a DLL must be executed via RunDLL32.exe or via RegSvr32.exe, whereas an EXE can be executed directly.

Now that we've become familiar with the file types that may utilize the PE format, we can take a deeper dive into understanding the format itself, and understanding how it may be useful to malware analysts such as ourselves.

The DOS header

The first section of a PE file is the DOS header. The DOS header is a leftover element, required for backward compatibility since the inception of the format.

Utilizing CFF Explorer in our VM, we can examine the sections that are relevant to us within the DOS header:

Figure 5.1 – The DOS header for our sample

Figure 5.1 – The DOS header for our sample

Only two sections are relevant to us within the DOS header, the e_magic section and e_ifanew. The first section, e_magic, contains the magic number for the executable. In all instances, a portable executable will start with MZ, or the hexadecimal equivalent of 5A4D. Historically, this stands for Mark Zbikowski, the developer of the PE file format. Knowing that every PE file will start with MZ assists us in being able to quickly identify a PE file in hexadecimal editors or via its header.

Analysis tip

Being able to identify the beginning of a PE file by hexadecimal or the signature MZ ... ! This Program cannot Be Run in DOS Mode can be a very useful tool for identifying PEs at a glance that have been loaded into memory, as all PE files will begin with this. Unfortunately, PEs do not have a trailer, so carving them out of blocks of memory can be challenging.

The e_ifanew section is the offset of the PE header. When Windows attempts to load the executable, it will go to this offset from the beginning of the portable executable in memory in order to begin execution. In this case, our PE header is located at +00000080 from the base address of the executable within memory. To clarify this, if our executable were loaded at the 0x00000020 base address, the PE header would be at 0x000000A0.

Figure 5.2 – The DOS stub in ASCII

Figure 5.2 – The DOS stub in ASCII

Between the DOS header and the PE file header, the DOS stub exists, which usually says something such as This program cannot be run in DOS mode. This is directly before the offset of the PE file header. Again, this is a fragment of backward compatibility, and present in every PE.

PE file header

The next section to examine is the PE file header, at the offset previously mentioned in the DOS header in the e_ifanew section:

Figure 5.3 – The PE file header

Figure 5.3 – The PE file header

Examining the PE header, there are three sections of use to us. Let's take a look at each of the three fields and the information they may offer about the binary we are examining:

  • The Machine field will give us the architecture that the executable is compiled for. For 32-bit executables, the value will be 0x014C, and for 64-bit, 0x8664. While other values are possible, these are the two values we'll focus on, as they are the most common.
  • The NumberOfSections field lists the size of the section table, which we'll cover in a bit – but this gives us a good idea of what contents we can expect and perhaps whether the executable is packed or not.
Figure 5.4 – The Characteristics pane in CFF Explorer

Figure 5.4 – The Characteristics pane in CFF Explorer

  • Clicking Characteristics in CFF Explorer gives us an additional pane with some information regarding the file. Here, we have more information about the architecture – it's a 32-bit executable, and as such cannot handle more than 2 GB of RAM allocated to it.

Additionally, we can see whether the file is a .DLL or a .SYS file by flags in this section.

Optional header

The optional header contains most of the interesting file metadata in a portable executable:

Figure 5.5 – The optional header offers a trove of information about the binary

Figure 5.5 – The optional header offers a trove of information about the binary

In Figure 5.5, I've highlighted the most important fields in the optional header for static analysis:

  • Magic: This section will contain one of two values – 0x010B for 32-bit executables or 0x020B for 64-bit executables.
  • AddressofEntryPoint: This section contains the address in memory of the entry point of the executable – where code begins. In this case, and in most cases, this corresponds with the .text section of the executable.
  • ImageBase: This corresponds with the base address in memory of the executable (where the image begins). In this case, it is 0x0040000.
  • MajorOperatingSystemVersion: This field contains the minimum version of the Windows OS that is required in order to execute the binary in question. In this case, the value is 0x0004, which corresponds to an OS prior to Windows 2000.
  • Subsystem: This reflects whether this is a Windows GUI-based application or a Windows Console or CLI-based application.
  • DllCharacteristics: While this is not applicable to our sample, this is a useful field that can tell us more information about a DLL, and is worth reviewing in cases where you are analyzing a DLL:
Figure 5.6 – DLL characteristics advertised by the PE

Figure 5.6 – DLL characteristics advertised by the PE

This section can reveal critical information about a DLL's capabilities, including whether it can move within memory and whether it is aware of whether it is running on a Terminal Services session or server.

Section table

The PE file format has several sections but we have only listed a few important ones, usually following a nomenclature similar to the following:

  • .text: Section storing executable code
  • .rdata: Read-only data on the filesystem, strings, and so on
  • .data: Non-read-only initialized data
  • .rsrc: Resource section – contains icons, images, and so on
  • .edata: Exported functions for DLLs
  • .idata: Imports and the Import Address Table (IAT)

Some of the sections described can be seen in the following screenshot:

Figure 5.7 – The sections table within the PE

Figure 5.7 – The sections table within the PE

Sections outside of the normal defined sections within a PE may be suspect and require further investigation. In this case, we have a non-standard section – r2. Non-standard sections often indicate the usage of a packer to obfuscate code. Additionally, if the virtual size and raw size of a section differ significantly, it may indicate the use of a packer.

The Import Address Table

The IAT within a binary is incredibly important to understand the functionality and capabilities that malware has been endowed with by its creator. In CFF Explorer, we can navigate to the Import Directory section to view the DLLs loaded by this malware:

Figure 5.8 – The imported libraries and the number of functions used from each in the binary

Figure 5.8 – The imported libraries and the number of functions used from each in the binary

For instance, we can see that this binary imports the following DLLs from Windows:

  • USERENV.dll: 1 function
  • ole32.dll: 6 functions
  • SHELL32.dll: 2 functions
  • USER32.dll: 5 functions
  • ADVAPI32.dll: 23 functions
  • msvcrt.dll: 6 functions

Functions within DLLs allow both legitimate and malicious software authors to utilize pre-coded functions, which helps save time – as they do not have to code this functionality directly into their application and can utilize the built-in system functions from these DLLs. Selecting one of the imported link libraries will allow us to view the functions it imports from the libraries:

Figure 5.9 – The location of the functions within the IAT and their names

Figure 5.9 – The location of the functions within the IAT and their names

In the preceding table, we can see that the malware imports several functions from advapi32.dll, their locations in the IAT, as well as their name. Searching for these API references on Microsoft's developer documentation site, https://docs.microsoft.com/en-us/windows/win32/api/, will often reveal incredibly useful information about the functionality of the malware.

In this instance, let's take a look at GetTokenInformation:

Figure 5.10 – Microsoft documentation provides excellent information on API calls

Figure 5.10 – Microsoft documentation provides excellent information on API calls

Microsoft has provided us with a succinctly worded description – this function will determine information about a security access token, and return a Boolean value based on whether the call succeeds – possibly utilized to determine the level of permission the malware has when it is running. This can be repeated for each API call or suspicious API calls within the sample itself.

There are several suspicious API calls, all of which can be utilized in legitimate ways, but some to look out for are as follows:

This is not an exhaustive list of suspicious API calls, but malware will often utilize one or several of these to achieve their nefarious purposes on the system – be it process injection, key logging, exfiltrating information, or downloading and executing secondary stages.

However, in some instances, it will not be immediately clear what API calls a binary may utilize, specifically if a packer is utilized. In cases such as this, a packed binary may only call one or two APIs. Let's take a look at how to identify packers and unpack binaries so we may examine them further.

Examining packed files and packers

Packing is one of the most common techniques adversaries utilize to attempt to obfuscate their executables. Both commercially available packers and custom packers exist, but both serve the same functionality – to both reduce the size of the executable and render the data within the binary unreadable before unpacking.

Packers work by compressing and encrypting data into single or multiple packed sections, along with a decompression or decryption stub that will decrypt and decompress the actual executable code before the machine attempts to decode it. As a result of this, the entry point of the program moves from the original .text section to the base address of the decompression stub.

In the next few sections, we'll see how we can discover packed samples via several methodologies, and also how we may unpack these samples.

Detecting packers

Detecting the usage of a packer is fairly simple, and there are several indicators that tend to be the most successful in identifying packed binaries. Let's review a few of the simplest ways to identify whether a binary has been packed:

  • Entropy: Utilization of the entropy of sections may reveal whether or not a sample is packed. Higher entropy reflects a higher level of randomization within the binary, which indicates the utilization of a tool for obfuscation:
Figure 5.11 – Detect It Easy and its graphical representation of Shannon entropy

Figure 5.11 – Detect It Easy and its graphical representation of Shannon entropy

The Detect It Easy tool has a good entropy portion that will give a visualization of the randomness of each section. The sample in the figure has been packed with UPX.

  • Section naming and characteristics: Packers will often create non-standard section names, such as UPX0 and UPX1 in the case of UPX, and standard section names will be missing from the section table, such as .text:
Figure 5.12 – Section names and sizes differ among packed and non-packed binaries

Figure 5.12 – Section names and sizes differ among packed and non-packed binaries

Additionally, the raw size of the section will be less than the memory that is allocated in the virtual size, suggesting that it will be unpacked into this section, as all binaries must be unpacked by the unpacking stub before the machine is able to execute the code.

  • Examining the imports: As indicated previously, a packed sample's API calls and imports differ significantly from those of an unpacked sample, generally speaking:
Figure 5.13 – Packed binaries often have far fewer imported API calls than unpacked binaries

Figure 5.13 – Packed binaries often have far fewer imported API calls than unpacked binaries

A packed executable will have far fewer imports than an unpacked binary – only what is necessary to unpack the executable. Reviewing the import directory in combination with other evidence can confirm the presence or utilization of a packer.

Unpacking samples

In the case of commercially available packers such as UPX, the tool utilized to pack the binary can simply be unpacked by using the tool with the correct command-line switches on the sample in question.

There are also several services, such as https://www.unpac.me, that will unpack malware samples, but again, are public services where your malware sample may become available.

Failing these, we'll cover the manual unpacking of malware samples in greater detail in Chapter 7, Advanced Dynamic Analysis Part 2 – Refusing to Take the Blue Pill.

In the next section, we'll see how NSA's Ghidra reverse-engineering tool can be utilized to perform much of the static analysis work we've done so far with various different tools.

Utilizing NSA's Ghidra for static analysis

Many of the static analysis techniques we have covered so far can be done within NSA's Ghidra platform as well, for a single-pane-of-glass view. We'll walk through the process of setting up a project in Ghidra, reviewing some of the information we've already looked at, and then diving into some other capabilities within Ghidra.

Setting up a project in Ghidra

When we start Ghidra, we'll be on a screen indicating that we have no active project. To begin work, we'll need to define a project, which can be done under the File menu:

Figure 5.14 – Creating a new Ghidra project

Figure 5.14 – Creating a new Ghidra project

Once we've selected this, we'll be asked to name our project. Any name will do, as long as it is meaningful to you:

Figure 5.15 – Naming our project

Figure 5.15 – Naming our project

Once Next is selected, the project is created. Now, to analyze a binary, simply drag and drop it onto Ghidra, which will then import the binary into the project, and ask for a few options. Go with the defaults here:

Figure 5.16 – Importing a PE into Ghidra

Figure 5.16 – Importing a PE into Ghidra

Once OK is clicked, double-click your executable to open the code browser for Ghidra. Ghidra will prompt you to analyze the executable. Let's proceed with the analysis:

Figure 5.17 – The Ghidra Analyze prompt

Figure 5.17 – The Ghidra Analyze prompt

Once the analysis is complete, you will be dropped at the main pane for Ghidra, allowing us to proceed with the analysis of the sample. Immediately, in the left-hand pane, we can see the Symbol Tree.

The Symbol Tree contains all of the imports we've previously identified in CFF Explorer. In the following figure, we can see the DLLs that have been loaded by the application, and clicking the expand button allows us to see the functions that have been imported from the library, as well as the arguments they accept when called:

Figure 5.18 – DLLs and imported functions of the PE within Ghidra

Figure 5.18 – DLLs and imported functions of the PE within Ghidra

Clicking one of the imported functions will take us to the address in memory where the function resides. Here, we can also see an XREF or cross-reference, where the function is called in another function in the malware. More succinctly, it will take us to where the function is utilized:

Figure 5.19 – Cross-references to an API call within the malware sample

Figure 5.19 – Cross-references to an API call within the malware sample

Double-clicking this cross-reference will open the decompiler and will give us pseudo-code of what it appears to be doing with this functionality.

Figure 5.20 – The decompiled view of the API call's cross-referenced function

Figure 5.20 – The decompiled view of the API call's cross-referenced function

Here, we can see that a variable is substituted for a hardcoded service name, and following the value, the variable appears to be undefined, suggesting it may require input from the malware author, or via some other methodology. We can also cross-reference the MSDN documentation for these variable names, located at https://docs.microsoft.com/en-us/windows/win32/api/winsvc/ns-winsvc-service_status, to get a better understanding of what we are looking at.

We do, however, know that the malware has the capability to alter built-in Windows services. Utilizing and following API calls in this fashion can help build a better map of the functionality and capabilities of different malware samples.

Figure 5.21 – The Ghidra window menu for Defined Strings

Figure 5.21 – The Ghidra window menu for Defined Strings

Ghidra is also able to give us defined strings within the program. We can utilize this to review any strings in a GUI fashion, separate from the previously discussed string utility:

Figure 5.22 – References to registry value types within defined strings in Ghidra

Figure 5.22 – References to registry value types within defined strings in Ghidra

Here, we can see references to Reg_SZ and Reg_DWORD, indicating the malware has the ability to set these. Following the cross-references, as we did for the API functions, we can see a function exists within the code that has the ability to delete, modify, and set the values of registry keys:

Figure 5.23 – A function that indicates the malware has the ability to create, delete, and modify values within the registry

Figure 5.23 – A function that indicates the malware has the ability to create, delete, and modify values within the registry

Similarly, we can follow the sequential flow of the program by beginning at the entry point (navigate to Functions | Entry in the left pane), and then using the function graph from Window | Function Graph:

Figure 5.24 – The function graph within Ghidra

Figure 5.24 – The function graph within Ghidra

Doing this will display a window showing the logical progression of the application, and the functions that it calls. Here, we have iterations of functions, including red arrows for the functions that are called if a specified condition is not met, and green arrows for specifying if a condition is met. Double-clicking any of these functions will open the corresponding function in the decompiler for examination.

While reverse-engineering is out of scope for this book, stepping through these functions in this way may give a good idea of the capabilities, functionality, and targeting of non-commodity malware.

Let's move on, and try to test the skills we've learned in this chapter!

Challenge

Utilizing the unknown.exe sample from the malware sample pack, and without running the application, attempt to answer the following questions utilizing any of the tools we've covered in this chapter – or any tools you're familiar with that provide the same information:

  1. Is the sample packed? What packer does it use?
  2. What kind of PE is this?
  3. If the sample is packed, unpack it. What's the raw size of the .text section after it's been unpacked?
  4. What DLLs does the sample import? Are there any suspicious functions called from these DLLs?
  5. If there are suspicious functions, name one, and what arguments it accepts from the function that calls them.
  6. Give a brief overview of the capabilities of this malware as you understand it.

Summary

In this chapter, we discussed advanced static analysis techniques. We dove into the PE file format, and all it entails – including sections, magic numbers, DLL imports, and Windows API calls. We also discussed packers, and why adversaries may choose to utilize these to hide the initial intention of their binaries.

While the tools covered in this chapter will get an enterprising analyst most of the static information they need, there are many tools that will also suffice and may provide better or more complete information.

Now that we have a good grasp of static analysis techniques, in the next chapter, we will move on to actually execute our malware and all the fun that comes with it. This will allow us to validate our findings from static analysis.

Further reading

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset