Chapter 8: De-Obfuscating Malicious Scripts: Putting the Toothpaste Back in the Tube

Often during malware analysis, a malicious binary is not the initial stage that presents to the end user. Somewhat frequently, an initial "dropper" in the format of a script—be it PowerShell, Visual Basic Scripting (VBS), a malicious Visual Basic for Applications (VBA) macro, JavaScript, or anything else—is responsible for the initial infection and implantation of the binary.

This has been the case in modern times with malware families Emotet, Qakbot, TrickBot, and many others. Historically, VBA scripts have comprised the entirety of a malware family—for instance, ILOVEYOU, an infamous virus from the early 2000s written in Microsoft's own VBS language.

In this chapter, we'll examine the following points that will assist us with de-obfuscating malicious scripts, somewhat akin to attempting to push toothpaste back into a tube after it's already been dispensed.

At the end of the chapter, you'll also have the opportunity to test the skills you've acquired by de-obfuscating malicious scripts provided during the course of the chapter!

We'll cover the following topics:

  • Identifying obfuscation techniques
  • Deobfuscating malicious VBS scripts
  • Deobfuscating malicious PowerShell scripts
  • A word on obfuscation and de-obfuscation tools

Technical requirements

These are the technical requirements for this chapter:

Identifying obfuscation techniques

Several obfuscation techniques are common across scripting languages, and it's important that we understand what is being done in an attempt to slow down analysis of a dropper or piece of malware and hinder incident response. We'll take a brief overview of some of the more common techniques that are utilized by adversaries in an attempt to prevent analysis within this section.

String encoding

One of the more common techniques utilized both within PowerShell and VBS or VBA malicious scripts is the encoding of strings. Encoding of strings, or function and variable names, makes the code harder to follow and analyze, as it is no longer written in plain English (or any other human-readable language). There are a few choices that are popular, but we'll cover the most popular ones.

Base64 encoding

Base64 is a binary-to-text encoding scheme that allows users to input any American Standard Code for Information Interchange (ASCII) text into an algorithm, with output that is no longer easily human-readable, as illustrated here:

Figure 8.1 – Utilizing the Base64 application to create encoded strings

Figure 8.1 – Utilizing the Base64 application to create encoded strings

As you can see, the string appears as though it may be random text, but does in fact easily decode from the VGhpcyBpcyBhIG1hbGljaW91cyBzdHJpbmcu value back to the text that was provided to the Base64 algorithm.

We can recognize Base64 by understanding the alphabet that is utilized. In short, Base64 will always use the A-z/+= character set. That is to say, Base64 can utilize all capital and lowercase A-Z ASCII characters, along with the forward slash, the plus sign, and the equals sign for padding.

Analysis tip

Base64 strings must always be in a string of characters divisible by four, so '=' is appended to any string that is not divisible by four as padding to ensure the 4-byte chunk is reached. If you recognize a string that fits these alphabet requirements, chances are it's Base64.

In order to decode our identified Base64 strings, we can utilize the CyberChef tool from Government Communications Headquarters (GCHQ), located at https://gchq.github.io/CyberChef/. The tool can be seen in the following screenshot:

Figure 8.2 – Utilizing CyberChef to decode Base64 strings

Figure 8.2 – Utilizing CyberChef to decode Base64 strings

Once we've selected the From Base64 recipe and put in our input string into the Input box, CyberChef will automatically parse our string through the Base64 decoding algorithm and present us with the corresponding ASCII string.

Recognizing Base64 is key to being able to de-obfuscate scripts and understand what steps threat actors are taking in order to hide their actions from analysts. However, it is not the only encoding scheme that is in use.

Base32 and others

Base64 is not the only encoding alphabet on the block. Also available are Base62, Base58, and Base85, though the 64 variant is by far the most popular. Key to understanding all of these variants is knowing the alphabets that are utilized by the encoding algorithm and being able to quickly decipher and differentiate between those utilized.

The following table outlines the key alphabet differences between each of the encoding algorithms:

Table 8.1 – The alphabets of Base-encoding algorithms

Table 8.1 – The alphabets of Base-encoding algorithms

With this knowledge, it should be easy to differentiate between the different encoding schemes in their utilization and decode them accordingly, to see what bad behavior whatever threat actor we are examining is undertaking within their dropper code.

ASCII ordinal encoding

Another popular encoding method is to utilize the numerical representations of ASCII characters. In ASCII, each character is assigned a numerical representation. The table shown in the following screenshot identifies all of the codes that correspond with the ASCII letter they represent on the keyboard:

Figure 8.3 – The ASCII ordinal table

Figure 8.3 – The ASCII ordinal table

The ASCII codes may be substituted in variable names, decoded into meaningful strings or code utilizing built-in functions within VBS, PowerShell, or other languages such as Chr(), then passed to another function within the code for execution. Let's take a look at the following example:

Dim Var1 as String

Var1 = "099 109 100 046 101 120 101 032 047 099 032 100 101 108 116 114 101 101 032 099 058 092 032 047 121"

Function func1(varStr)

On Error Resume Next

varStr2 = Chr(varStr)

Dim oShell

Set oShell = WScript.CreateObject ("WSCript.shell")

oShell.run varStr2

In the following example, a group of ASCII ordinals is first converted back to regular characters utilizing VBS's built-in Chr() function then passed to a WScript.Shell instance that was created, which then executes the corresponding malicious string as a command on the command line:

Figure 8.4 – Converting ASCII ordinals back to text

Figure 8.4 – Converting ASCII ordinals back to text

Unfortunately, at the time of writing, CyberChef does not have a built-in recipe with which to decode or encode ASCII ordinals to characters and vice versa. However, several instances of these can be found online by simply googling them. Copying the preceding ordinal string into one of these should reveal the malicious command that is being run.

Hexadecimal encoding

Encoding within Base algorithms is not the only technique available to malware authors. Besides utilizing these and readable ASCII, it is also possible to utilize hexadecimal notation in order to obtain obfuscation of the script yet retain easy conversion back to executable script.

Hexadecimal is fairly easy to recognize, based on its relatively short alphabet and usual notations. The alphabet for hexadecimal is simply A-F0-9—that is to say, all letters A-F, and all numbers 0-9. Case does not matter for hexadecimal notation. If any letter within a string is seen that is beyond F within the alphabet, you can rest assured that it is not, in its current form, hexadecimal notation.

Analysis tip

Various delimiters are utilized for hexadecimal notation, including 0x, x, x, %, CRLF, LF, and spaces. However, they all perform the same function of separating the two preceding hexadecimal bytes from the following two hexadecimal bytes.

We can take a look at several examples, and utilize CyberChef as we did with Base encoding to decode our samples. Let's try the following strings:

  • x54x68x69x73x20x69x73x20x45x78x61x6dx70x6cx65x20x4fx6ex65x2e
  • 54%68%69%73%20%69%73%20%45%78%61%6d%70%6c%65%20%54%77%6f%21
  • 0x540x680x690x730x200x690x730x200x450x780x610x6d0x70 0x6c0x650x200x540x680x720x650x650x2e0x200x4e0x690x630x 650x200x770x6f0x720x6b0x2e

The following screenshot shows hexadecimal characters being converted to ASCII characters in CyberChef:

Figure 8.5 – Converting hexadecimal to ASCII characters in CyberChef

Figure 8.5 – Converting hexadecimal to ASCII characters in CyberChef

Utilizing the From Hex recipe within CyberChef, we can select the correct delimiter (or leave it on Auto to have CyberChef decide) that separates each 2-byte subsection of our string and get the correct output returned!

Obviously, encoding is not the only tool that can be utilized by malware authors to obfuscate their payloads. In the next few sections, we'll take a look at other methodologies, starting with string concatenation.

String concatenation

Encoding strings is not the only way a malicious author can hide their intentions and make instructions within scripting difficult to read. Another common methodology is to concatenate multiple separate strings in order to form a complete command.

In essence, several chunks of code are separately stored in various variables that do not make sense on their own and are then later combined into a single string that makes sense when their execution is required.

To make more sense of this technique, we can take a look at an example here:

$var1 = "scri"

$var2 ="pt.she"

$var3 = "ll"

$var5 = "w"

$var5 = New-Object -ComObject ("$var5 + $var1 + $var2 + $var3")

The preceding example is in Windows PowerShell, and concatenates five variables while passing them to the New-Object cmdlet. It's fairly obvious in this example that the command the malicious actor is utilizing is creating a new WScript Shell in which to pass further malicious scripts.

While it is not always this obvious what the author intended in their string concatenation, several variables being chained together in arguments should be an immediate cause for concern, and string concatenation should be assumed by the examining analyst.

String replacement

A close cousin of string concatenation, string replacement creates strings with meaningless data within the middle of executable code. Let's take a look at an example of string replacement here, in order to understand the impact of this:

$var1 = cmAQGlXFeGhOd.exe /c AQGlXFeGhO%appAQGlXFeGhOdaAQGlXFeGhOta%malwAQGlXFeGhOare.exAQGlXFeGhOeAQGlXFeGhO

StartProcess(($var1 -Replace "AQGlXFeGhO" ""))

As shown in the preceding example, you can see a randomly generated string has been inserted into the otherwise valid command, obfuscating it and making it quite difficult to read at a glance without either superhuman powers or considerable effort. However, it still easily executes at runtime when the characters are replaced by PowerShell during or before the StartProcess cmdlet is called, as illustrated here:

Figure 8.6 – String replacement in a CARBON SPIDER dropper

Figure 8.6 – String replacement in a CARBON SPIDER dropper

Often, string replacement can be utilized in combination with concatenation to create code that is very difficult to read and time-consuming to reverse for an analyst.

Other methodologies

Playing with strings in various ways is not the only way that malware authors can obfuscate the true objective of their code. There are various other methods employed, often in combination with encoding, substitution, and concatenation methodologies.

Variable and function naming

In normal coding, it's generally important to give functions and variables meaningful names in order to assist future programmers who may work on your project in understanding execution flow and the purposes for the decisions you have made during the course of your creation of the script or program.

This is not the case in malware. In malicious scripts, it's often the case that variables, functions, and arguments passed to these functions are given random, meaningless, or outright misleading names in order to purposefully hinder analysis of the dropper in question, as can be seen in the following example:

Figure 8.7 – Useless, random variable names in a Qakbot dropper

Figure 8.7 – Useless, random variable names in a Qakbot dropper

Uncalled or pointless functions

Another methodology utilized is to insert code that does nothing—the primary purpose of the code may be able to be accomplished in 5-10 lines of code, but the dropper may include hundreds or thousands of lines, including functions that are never called, or return null values to the main function, and never affect the execution flow of the dropper. An example of this can be seen here:

Figure 8.8 – A function that does nothing and returns no values in a Qakbot dropper

Figure 8.8 – A function that does nothing and returns no values in a Qakbot dropper

The impact of this is that it makes it far more difficult for an analyst or heuristic code analyzer to locate the true beginning of execution of the malicious script.

Now that we have a good understanding of some of the methodologies that may be employed by threat actors, we can now examine how we may begin obfuscating malicious scripts and droppers employed by these actors.

Deobfuscating malicious VBS scripts

In this section, we'll take a look at some of the methodologies we've learned about and learn a few shortcuts to de-obfuscating malicious VBS and VBA scripts within our Windows virtual machine (VM) to understand what the malicious author may be attempting to accomplish.

Malicious VB scripts are one of the more common methodologies in use throughout the history of malware as it's easy to code in, easy to learn, ubiquitous, and powerful within the environment that comprises most malware targets—Windows.

Utilizing VbsEdit

A free tool, VbsEdit, is one of the best methods to approach de-obfuscation of VB-based scripts. The tool can be obtained from the link within the Technical requirements section at the beginning of this chapter.

Once the tool is downloaded, proceed through the installation, accepting default options—they'll work perfectly.

Of note, the tool does have an optional license but it is not required, and the evaluation period does not expire.

Once open, click Evaluate within the prompt, and proceed to the main window.

Here, we'll open a malicious VBS example from the CARBON SPIDER threat actor to examine what information we can pull out of the script via debugging and evaluation, utilizing the VbsEdit tool. The tool can be seen in the following screenshot:

Figure 8.9 – The Open button in VBSEdit

Figure 8.9 – The Open button in VBSEdit

First, we'll utilize the Open button and then load our selected script from the filesystem. Once we've done this, we can simply click Start Debugging with CScript and allow the script to run, as illustrated in the following screenshot:

Analysis tip

Debugging the script is dynamic! The malicious script will be executed on your system as a result of running this. Ensure that you are properly sandboxed, as outlined in previous chapters, before running this!

Figure 8.10 – The obfuscated CARBON SPIDER dropper

Figure 8.10 – The obfuscated CARBON SPIDER dropper

Once the script has finished running, a new tab will appear entitled eval code:

Figure 8.11 – The evaluated code tab within VbsEdit

Figure 8.11 – The evaluated code tab within VbsEdit

Upon clicking this, you'll see that the obfuscated actions within the code have been transformed into fairly readable code! Unfortunately, it's all on a single line—but with some quick formatting changes, we'll have the full, de-obfuscated script.

Thankfully, there's a standard delimiter within VbsEdit—the colon denotes each new command. Utilizing Notepad++'s Find and Replace feature with Extended search mode allows us to replace each instance of a colon with —a newline character in Windows. This is illustrated in the following screenshot:

Figure 8.12 – Finding and replacing the delimiter within Notepad++

Figure 8.12 – Finding and replacing the delimiter within Notepad++

Once we utilize this delimiter to replace the colons, Notepad++ will basically format the entirety of the dropper for us, as illustrated in the following screenshot:

Figure 8.13 – Perfectly formatted, totally de-obfuscated CARBON SPIDER dropper

Figure 8.13 – Perfectly formatted, totally de-obfuscated CARBON SPIDER dropper

Being sure to skip valid uses of a colon within strings within the script (Uniform Resource Locators (URLs), Windows Management Instruction (WMI) commands, and so on), we can replace each one with a new line and obtain a full copy of the malicious script!

While VbsEdit is certainly the best way to deobfuscate malicious VBS scripts, it's not the first way, and certainly isn't the only one. We can also utilize built-in utilities such as Echo in WScript.

Using WScript.Echo

In some instances, it may be useful to obtain the value of a single variable within a script as opposed to dynamically executing and obtaining a full copy of a de-obfuscated script. In these instances, Echo can be utilized within the script in order to obtain the value.

Simply locate where you believe the variable to be set to the desired value you'd like to return, and add in a line that echoes the variable name with Echo(Variable). While this method does have its benefits, it's much more beneficial to utilize the previously discussed VBS Debugger to obtain a full copy of the script if you already have a detonation environment set up in the proper manner.

While malicious VBS droppers are certainly still in vogue due to the ability to run them on any version of Windows in use today, other malicious scripts and droppers written in PowerShell also exist.

Deobfuscating malicious PowerShell scripts

Perhaps one of the most common scripting languages in use for both malicious and legitimate administration purposes is the built-in Windows scripting engine based on .NET—PowerShell.

PowerShell has been embraced readily by threat actors, red teamers, and systems administrators alike to accomplish their ends due to its power.

As a result of this power, it's also incredibly easy to obfuscate PowerShell scripts in many different ways. We'll take a look at a few examples exclusive to PowerShell, and a real-world example utilized by Emotet!

First, we'll take a look at a few examples that are utilized by PowerShell that are generally unique to PowerShell malware samples.

Compression

The first method (which is one of the most commonly utilized obfuscation methods) is compression, as shown in the following code snippet:

.($pshOme[21]+$PsHomE[30]+'X') (NEw-obJECt  iO.STREAmREAdER ( ( NEw-obJECt  SyStEm.iO.cOMpREssIOn.DeflAtEstreaM([SYstEM.Io.MemoRYsTREaM] [sYSTEm.CONvERt]::FROMBAsE64sTRinG ('TcmxDkAwFAXQX5FOJLzuVmJkMHSxFDdReW1FX1L+3uqspxyRm2k9sUkxv 0ngaYSQwdqxQ5CK+pgDR7sPjlGqQ+RKrdZ4rL8YtEWvveVsbxAeqLpQXbs YF/aY0/Kf6gM='),[SYSteM.iO.CoMPresSIOn.cOMPReSSIoNmoDE]::DECompReSS)), [sysTeM.TeXT.EncODinG]::asCIi) ).reAdtOENd()

As you can see, several obfuscation methods are utilized here. First, Base64 encoding is utilized to obfuscate what appears to be a string that is being utilized by the System.IO.Compression.DeflateStream cmdlet. Let's grab the Base64 string and paste it into CyberChef to try to decode what it holds, as follows:

Figure 8.14 – Binary data from a Base64-encoded string in CyberChef

Figure 8.14 – Binary data from a Base64-encoded string in CyberChef

Unfortunately, decoding the data appears to have returned binary as opposed to ASCII commands in this instance. No matter—CyberChef has another recipe that will be of use! As we can see the DeflateStream directive, we know that we should utilize the Raw Inflate recipe within CyberChef to reverse the action taken during obfuscation, as illustrated in the following screenshot:

Figure 8.15 – Inflating the binary data from within CyberChef to return the ASCII command

Figure 8.15 – Inflating the binary data from within CyberChef to return the ASCII command

With Raw Inflate interpreting the binary data, we can now see what the obfuscated command is attempting to do!

Other methods within PowerShell

PowerShell offers several methods for obfuscation that are unique to the language itself but fall within the categories previously covered. However, it's important to mention them in the context of PowerShell, since they can differ somewhat.

Backticks

Command tokens (cmdlets) can be separated and obfuscated by utilizing backticks (grave accents) within the command token—for example, New-Object becomes 'N`ew-O`b`je`c`t. This is particularly powerful when combined with other methods.

Concatenation of cmdlets

Concatenation is not limited to variables within PowerShell—it can also be applied to command tokens and cmdlets—for example, New-Object could become & ('Ne'+'w-Ob'+'ject').

Addition of whitespace

PowerShell, generally speaking, does not interpret whitespace. When combined with backticks and string concatenation, it's possible to make even normal cmdlets very confusing. For example, New-Object may become ( 'Ne' +'w-Ob' + 'ject') or similar.

Reordering via splatting

Perhaps the most complex method, the malicious author may choose to load substrings of a command into an array, and then execute them in the proper order by pulling each substring out of the array and then re-concatenating it. For example, see the following code snippet:

.("{1}{0}{2}"-f'e','N','w-Object')

In this example, New-Object is loaded into an array with the following values:

  • Value 1 = N
  • Value 0 = e
  • Value 2 = w-Object

As such, each value is called in the order that makes sense—1, 0, 2—and then executed!

With knowledge of these obfuscation techniques, let's now take a look at an example.

Emotet obfuscation

Let's take a look at an obfuscated Emotet PowerShell command in order to see if we can manage to de-obfuscate and extract the dropper domains from the script to find which domains we should be blocking requests to at our firewall. Let's look at the command, which can be found in the malware samples downloaded for this chapter in EMOTET.txt:

First, we can utilize the From Base64 recipe within CyberChef, which will decode and give us the output of the Base64-encoded string, as illustrated in the following screenshot:

Figure 8.16 – First step: decoding of the Emotet dropper

Figure 8.16 – First step: decoding of the Emotet dropper

We can see that there are several null bytes also within this command—these are represented by the '.' character within CyberChef. We'll go ahead and remove these with the Remove Null Byte recipe, as illustrated in the following screenshot:

Figure 8.17 – Second step of decoding, with null bytes removed from the dropper

Figure 8.17 – Second step of decoding, with null bytes removed from the dropper

We're definitely making some progress! However, we can see some fairly dense concatenation, utilizing what looks like the characters + and (), and whitespace. Utilizing Find / Replace recipes within CyberChef, we can substantially cut down on the noise the concatenation characters are causing, and smash all the characters back together, as illustrated in the following screenshot:

Figure 8.18 – Third step in decoding, with erroneous whitespace and concatenation characters removed

Figure 8.18 – Third step in decoding, with erroneous whitespace and concatenation characters removed

We're definitely almost there! Now, it just looks like we have a few more steps. As we can see, where HTTP(s) would normally be, it appears to be replaced with ah. We can create a simple find-and-replace REGEX rule to replace ah with http, as illustrated in the following screenshot:

Figure 8.19 – Extracting the URLs from the Emotet dropper

Figure 8.19 – Extracting the URLs from the Emotet dropper

Once done, we can simply utilize the Extract URLs recipe to pull all of the command and controls (C2s) out of the script!

Now that we have covered several different ways to de-obfuscate code semi-manually, let's take a look at some of the automated tools utilized by attackers, and some of their counterparts in incident response.

A word on obfuscation and de-obfuscation tools

There are several tools that are useful for both obfuscating and de-obfuscating malicious scripts. We'll touch on several of these, and also their de-obfuscation counterparts.

Invoke-Obfuscation and PSDecode

Invoke-Obfuscation is a powerful tool written by an ex-Mandiant red-team employee. It can take existing PowerShell scripts that have not been obfuscated in any way, and fully obfuscate them to evade endpoint detection and response (EDR) detection and make analysis more difficult for analysts. If you'd like to practice creating obfuscated scripts, the tool can be downloaded from https://github.com/danielbohannon/Invoke-Obfuscation. You can see the tool in action in the following screenshot:

Figure 8.20 – The splash screen and options for Invoke-Obfuscation

Figure 8.20 – The splash screen and options for Invoke-Obfuscation

The blue-team counterpoint to Invoke-Obfuscation is PSDecode, which attempts to go through line by line to de-obfuscate and reverse compression or exclusive OR (XOR) methodologies used to hide or otherwise make difficult the analysis of malicious PowerShell scripts. PSDecode is shown in action in the following screenshot:

Figure 8.21 – Example output for PSDecode

Figure 8.21 – Example output for PSDecode

This tool should be considered essential to any malware analyst's toolbox, and may be downloaded from https://github.com/R3MRUM/PSDecode.

JavaScript obfuscation and JSDetox

There are many JavaScript obfuscation frameworks available—too many to name. However, the Metasploit JavaScript obfuscator is probably the most commonly used. An example of the output produced by the Metasploit JavaScript obfuscator is provided in the following screenshot:

Figure 8.22 – Example of obfuscated JavaScript by the Metasploit obfuscator

Figure 8.22 – Example of obfuscated JavaScript by the Metasploit obfuscator

Obviously, this does not make for particularly readable code. Thankfully, the JSDetox tool, which can be downloaded from http://www.relentless-coding.com/projects/jsdetox/, can make short work of most JavaScript obfuscation. This is shown in the following screenshot:

Figure 8.23 – The same Javascript, run through JSDetox

Figure 8.23 – The same Javascript, run through JSDetox

A sample output of the previous code snippet would be as shown in the preceding screenshot. This makes for much more obvious code! We can now see that the payload is creating a backdoor with CLSID persistence, and the payload is hosted on localhost on port 8080!

Other languages

A plethora of tools exist for other languages, but with JavaScript, VBS, and PowerShell comprising the vast majority of languages, these will serve you well as an analyst in combination with CyberChef and understanding encodings when you see them!

Challenges

Utilizing CyberChef, any automated tools covered, and the Qakbot.txt and EMOTET_2.txt samples within the Technical requirements section, attempt to answer the following questions:

  1. Which site is the Qakbot malware downloading its executable from?
  2. Which methodology is Qakbot using to download the file? (Which built-in function is it using?)
  3. Which C2s is the Emotet sample using for distribution?
  4. What was the exact recipe utilized in CyberChef to obtain this information?

Summary

In this chapter, we covered basic methods of de-obfuscation utilized by threat actors in order to hide the malicious intents of their script(s). With this knowledge, it's now possible for us to recognize attempts to hide data and action on objectives from us.

We can utilize this knowledge to leverage the tools we learned about—PSDecode, VBSDebug, and CyberChef to collect indicators of compromise (IOCs) and better understand what a malicious script may be trying to do or stage on a system. As a result, we are better prepared to face the first stage of adversarial software head-on.

In the next chapter, we'll review how we can take the IOCs we collect as a result of this and weaponize them against the adversary to prevent breaches in the first place!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset