CHAPTER 15
Investigating Hacker Tools
During investigations of computer crime, particularly computer intrusions, you will encounter rogue files with an unknown purpose. You know that the rogue file is doing something that the attacker wants, but all you have is a binary file and perhaps a few theories about what that file does.
Tool analysis would be much simpler if attackers left their source code behind. But most attackers have something in common with Microsoft: They protect their source code. Without it, you are left to muddle through object code and trace the functionality of the program.
In this chapter, we outline a sound scientific approach to performing tool analysis. You will learn how to take an executable file with an unknown function and perform operations on it to gain insight into the file’s intended purpose.
WHAT ARE THE GOALS OF TOOL ANALYSIS?
If you are lucky, the hacker tools have filenames that give enormous clues about their function. A file called sniffer or esniff is likely to be a sniffer tool. However, it is more likely that the attackers have renamed their code to some innocuous system filename such as xterm or d.1. These names offer few clues about the function of a rogue program. Therefore, you will need to analyze these tools to achieve the following goals:
Prevent similar attacks in the future
Assess an attacker’s skill or threat level
Determine the extent of a compromise
Determine if any damage was done
Determine the number and type of intruders
Prepare yourself for a successful subject interview if you catch the attacker
Determine the attacker’s objectives and goals (specific targeting versus target of opportunity)
A compiler, such as the GNU C compiler, reads an entire program written in a high-level language, such as C or Pascal, and converts it to object code, which is often called machine code, binary code, or executable code. Think of compilers as programs that translate human-readable source code into the machine language that a system understands. Machine language can be directly executed by the system’s processor.
There are many ways for attackers to compile their source code. Some methods of compilation make tool analysis easier than others. It is common sense that the larger the binary file is, the more information investigators can obtain when performing analysis of the file. In the next few sections, we explain the different ways a program can be compiled and how each affects the amount of information available to the investigator during tool analysis.
Statically Linked Programs
A statically linked executable file contains all the code necessary to successfully run the application. It typically does not have any dependencies. This means that the program will run without relying on a specific version of an operating system. Some commercial applications that you download from the Internet may be statically compiled so that they do not depend on any libraries on your system. For example, Sun Microsystems’ StarOffice is distributed as a statically linked package. Sun distributes StarOffice in this format to overcome the differences in the various distributions of the Linux operating system.
Here is an example of a command to statically compile a program within the Linux operating system using the GNU compiler:
In this command line, the source code zap.c was compiled to create a statically linked object file called zapstatic.
NOTE
As you learned in Chapter 13, zap is a log-wiping tool that erases a specific user’s entries from the utmp, wtmp, and lastlog files.
Dynamically Linked Programs
Nearly all modern operating systems support the use of shared libraries, which contain commonly used functions and routines. By compiling a program to use the shared libraries, a programmer can reference them somewhere in memory when the program needs to use those functions and routines, rather than incorporating all that code in the application itself. This reduces the size of the executable file, conserves system memory, and permits updates to the shared libraries without the need to change any of the original programs. Programs that use shared libraries are dynamically compiled. Each dynamically compiled program references the single copy of the shared library located in memory. Figure 15-1 illustrates how dynamically compiled and statically compiled programs use system memory.
Figure 15-1. How static and dynamically linked processes use system memory
Dynamically linked programs are the standard type. Using the GNU compiler, the following command line yields a dynamically compiled executable file:
The default behavior of the GNU compiler creates a dynamically linked executable.
Programs Compiled with Debug Options
On rare occasions, you will be lucky enough to encounter hacker tools that have been compiled in debug mode. Debug compilations are normally used by software developers during the early stages of the program’s development to help them troubleshoot problems and optimize their code. When debug options are enabled, the compiler will include a lot of information about the program and its source code.
The following command line shows how you would use the GNU compiler to compile the source code file zap.c with the debug options enabled. Notice that this is accomplished by adding the -g option to the command line.
There are three debug levels that display increasing amounts of information. The default is level 2. Depending on the debug level, GCC may produce information to facilitate backtraces, descriptions of functions and external variables, local variables, and macro definitions.
The following is a listing of a directory that contains the log-wiping tool zap compiled dynamically, statically, and with debug options.
Notice the size of each version. The dynamically compiled zap is 13,217 bytes, and the static zap is 1,587,273 bytes in size. The static zap binary file is more than 120 times larger than the dynamic zap binary file. The debug version contains additional data, making it nearly twice the size of the dynamically compiled zap.
Strip is a function that discards all symbols from the object code to make a file much smaller and perhaps more optimal for execution. Since stripped, dynamically compiled programs result in the smallest size executable, these types of files are usually the most difficult for an investigator to analyze when using string and symbol extraction techniques. For example, if the file has not been stripped and contains symbols, the nm command will display them. Conversely, the strip
command will remove that information.
The following command line demonstrates using the GNU version of strip
and shows how much smaller the dynamically compiled, stripped version of zap is compared to the files created with other types of compilation.
NOTE
Most utilities generate a new file, but strip
modifies the actual content of the object file specified on the command line.
Notice that stripping the dynamically linked zap program (zapdynamic) shrinks the file size from its original size of 13,217 bytes (as shown in the previous section) to 4,400 bytes.
UPX, or the Ultimate Packer for eXecutables, is becoming increasingly popular as an effective compression tool for executable files. Perhaps another reason for its popularity is that attackers can use it to obscure their illicit programs from signature-based IDS. UPX will pack and unpack Linux and Win32 applications, as well as DOS 16-bit executable and .com files, DOS 32-bit COFF files, DOS 32-bit executables, and Atari TOS/MiNT executables.
A review of the ASCII-formatted strings within the rogue code will show whether UPX was used to compress the executable, as shown in the example in Figure 15-2. If you find an executable packed with UPX, you should decompress it using UPX in order to be able to review the strings contained within the normal executable file. You can review the strings in a file using the strings command, as described in the “Reviewing the ASCII and Unicode Strings” section later in this chapter.
Figure 15-2. A strings command showing a tool that has been packed with UPX
Symbol Extraction
If a file has not been stripped (with the strip
command), an investigator may be able to analyze it using string and symbol extraction techniques. To extract symbols from a Unix object file, use the nm command (-a means list all):
In the nm command output, the first column is the symbol value in hexadecimal, followed by the symbol type, and then the symbol name. For symbol types, if the character is lowercase, it represents a local variable. Uppercase characters represent global (external) variables.
Here are some examples of symbol types:
A
indicates an absolute value (it won’t be changed by further linking).
B
indicates an uninitialized data section.
C
indicates a common section (uninitialized data).
D
indicates an initialized data section.
N
indicates a debugging symbol.
R
indicates a symbol in a read-only data section.
T
indicates a symbol in a text or code data section.
U
indicates an undefined symbol.
When debugging information is included, nm’s list line numbers command-line option, –l, may provide valuable information:
Compare this to the previous non-debug output, and you will notice that the kill_utmp
function started at line 17 of the file zap.c, which was in the directory /home/johndoe. The kill_wtmp
function started at line 33 of the source code, and kill_lastlog
started at line 59. Even without the source code, the line numbers provide valuable information, such as the procedure names and number of lines of code for each procedure, along with path information. In this particular case, the procedure names also shed light on the capabilities of the utility.
TIP
Any version of UPX can handle all supported formats. For example, the Win32 version can unpack UPX-compressed Linux executable linked format (ELF) binary files.
GO GET IT ON THE WEB
Compilation Techniques and File Analysis
Now that you’ve been exposed to several compilation techniques, let’s examine a suspect file called Z, found recently on a Linux system.
The file
command output (discussed in the “Determining the Type of File” section later in this chapter) clearly indicates that UPX was used to pack this file. The next step is to use UPX to unpack (decompress) the suspect binary.
The following command decompresses (unpacks) the suspect file and stores the output in the file named foo.
Since the previous file
command was executed on the compressed file
, we run the file command again. As you can see, the uncompressed object file was not stripped.
While a previous strings
command showed little of value (since the file was compressed), executing strings -a on the unpacked output file immediately reveals material of interest:
From this strings
output, you can see the program looks for the /var/run/utmp, /var/log/wtmp, and /var/log/lastlog files; has functions kill_utmp, kill_wtmp, kill_lastlog
; and contains the word “Zap.” Additional debug information is present, and we can see that the GNU version 3.2 of GCC for Red Hat Linux version 8.0 was used to compile the tool.
STATIC ANALYSIS OF A HACKER TOOL
Static analysis is tool analysis performed without actually executing the rogue code. Because you do not intend to execute the rogue code during static analysis, you can perform static analysis on any operating system, regardless of the type of object code. For example, you can use the Solaris operating system to perform static analysis of a Win32 application.
The general approach to static analysis involves the following steps:
1. Determine the type of file you are examining.
2. Review the ASCII and Unicode strings contained within the binary file.
3. Perform online research to determine if the tool is publicly available on computer security or hacker sites. Compare any online tools identified with the tool you are analyzing.
4. Perform source code review if you either have the source code or believe you have identified the source code via online research.
Determining the Type of File
Once you have identified the executable files that require tool analysis, your next step is to determine how the executable files were compiled, as well as their native operating system and architecture. There are many different types of executable files you may encounter, including the following common types:
Windows 95/98/NT/2000/XP executable or dynamically linked library (DLL)
Linux a.out/elf/script
Solaris a.out/elf/script
DOS 32-bit COFF
DOS 16-bit .com file
DOS 16-bit executable
Atari ST/TT
Fortunately, both Unix and Windows provide a command that retrieves the needed information.
Using the Unix File Command
The standard command for determining a file type on Unix systems is file
. The following example shows the results of using the file
command on several different types of executable programs:
You can see that the file
command can accurately determine how files were compiled and can also identify the operating system and architecture on which the file
will execute. (ELF executables are the most common type of executable files for Linux and other Unix flavors.) The /usr/share/magic file offers approximately 5,000 different file types that Linux will recognize with the file
command.
Using the Windows Exetype Command
The Windows equivalent of the file command is the NT Resource Kit tool exetype. This tool recognizes fewer file types than the file command, but it is still extremely useful. Figure 15-3 demonstrates how the exetype command is used.
Figure 15-3. Using exetype
Reviewing the ASCII and Unicode Strings
Basic static analysis of object code involves examining the ASCII-formatted strings
of the binary file. By identifying keywords, command-line arguments, and variables, you will acquire some insight into the purpose of a program.
The command used to extract ASCII strings
is strings
. The strings command is standard on most Unix variants and is available for Windows from the Sysinternals web site. The strings
command has the following syntax:
This command line will display all ASCII strings
contained in the object code that are four characters or longer. Notice the -a option. If this option is omitted, the Unix variant will scan only portions of the binary file.
On Windows-based executables, it is important to perform Unicode string searching as well. Windows 2000 is built on Unicode, and many Windows-based applications use Unicode. The strings
utility available for Windows defaults to performing a Unicode search when used with only the filename as the command-line argument.
NOTE
Unicode is a standard character set that uses 2-byte values to represent a character. Because Unicode uses 16 bits to represent a single character, there are more than 65,000 characters available, which makes Unicode capable of encoding characters from many different languages. Currently, Unicode values are defined for Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese Kana, Korean Hangul, English, Armenian, and several other languages.
Hex editors are to the computer investigator what a hammer and nails are to a carpenter. When all analysis fails, the hex editor is our friend. However, when performing static tool analysis, the hex editor is only slightly better than the strings
command. It allows you to see Unicode and ASCII strings within a file at the same time.
Anything that the program does not dynamically create or take in from another source, such as command-line interaction, may be found in the object code. When you review the strings in the object code, look for the following items:
The name of the source code files before the application was compiled
The exact compiler used to create the file
The “help” strings in the tool
The error messages that the program displays
The value of static variables
GO GET IT ON THE WEB
What Can Happen
You obtain a rogue executable file from a compromised Linux system. You decide to examine the strings to unearth some clues about the file’s function. You can guess it is the infamous log-wiping tool zap, since the file is called zap.
Where to Look for Evidence
You decide to analyze the tool on a Windows system to avoid accidentally running the program. You execute the exetype
command to confirm that it will not execute properly on your Windows forensic workstation, as shown in Figure 15-4.
Figure 15-4. Using exetype on non-Windows executables
Examining the strings
output confirms your suspicion that the tool is most likely the zap utility. In the strings
output, shown in Figure 15-5, you see some relevant lines. There appear to be variables or functions named kill_utmp, kill_wtmp
, and kill_lastlog
.
Figure 15-5. Using strings
to review function and variable names in an executable file
The strings command yields the filename of the source code used before compilation and the compiler version used to create the rogue file. Figure 15-6 shows the exact compiler used to create the rogue file. This information is useful if you are able to locate source code that you believe is similar to the binary in question.
Figure 15-6. Using strings to determine the compiler used
Performing Online Research
There was a time when it seemed everyone’s tool analysis was nothing more than scouring the Web for a tool with the same name as the rogue file. This is certainly not a comprehensive way to do tool analysis. However, knowing whether or not there have been other attacks incorporating the same tools you have discovered is very helpful. You can perform the strings
command on rogue executable files to determine the compiler used to create the executable file. If you find an online tool that appears to have a similar function, you can compile the publicly available source code with the identical compiler used by the attacker and examine the resulting file size. A very narrow margin in size may suggest the tools are similar. If the tools are exactly the same size, then you have just found your source code to the hacker tool.
Publish Advisories
Once malicious code is identified, the details of the attack (MD5 sums, location of code, and so on) can be published in advisories (such as NIPC bulletins) so that other organizations can check for the existence of this code.
Performing Source Code Review
With the source code available to you for review, you will be capable of determining exactly what a rogue program does. Therefore, obtaining the source code is probably the best measure for performing comprehensive static analysis of a program. Two occasions when you will be lucky enough to perform source code review include when the attacker leaves the source code on a system and when you identify the identical program from another source (perhaps online) with the proper source code.
Eye Witness Report
While performing incident response for a global client, we discovered that the attacker had installed a toolkit that contained 15 tools. Unfortunately, one of the main tools used by the attacker was deleted from the system, and we could not recover it using standard undelete tools. We conducted an online search and found that there were other victims with the same tools installed on their systems. One of the victim sites even posted the tools the investigators believed were used on their compromised systems. This toolkit had the file we needed to fully reconstruct the attack. An MD5 sum of the tools obtained online matched those of the tools we recovered from our client’s system. We gained additional insight from the other victim’s analysis of the attack, and we could provide law enforcement with a list of victims to prove how widespread the new attack was becoming.
Performing source code review requires working knowledge of the programming language used to create the tool. Most popular exploits and tools are found in ANSI C and Microsoft Visual Basic scripting, so you should become familiar with these formats.
DYNAMIC ANALYSIS OF A HACKER TOOL
Dynamic analysis of a tool takes place when you execute rogue code and interpret its interaction with the host operating system. This can be dangerous because whatever ill effects the rogue code intends may take place on your forensic workstation. However, this is often the most enlightening form of tool analysis. Our methodology includes the following tasks:
Monitor the time/date stamps to determine what files a tool affects.
Run the program to intercept its system calls.
Perform network monitoring to determine if any network traffic is generated.
Monitor how Windows-based executables interact with the Registry.
Creating the Sandbox Environment
When conducting dynamic tool analysis, you are actually executing the rogue file in order to document the effects it has on a system. Therefore, you need to invest the time to set up the proper test environment.
First, make sure that you have the operating system and architecture necessary to execute the object code properly. Also, install VMware on your test system. VMware allows you to run the tools in a controlled environment that will not damage the forensic workstation on which you are executing the rogue code. A feature of VMware, called nonpersistent writes, allows the investigator to execute rogue code in an environment where the ill effects of the rogue code will not be saved to the disk. To enable this feature, open the VMware Configuration Editor and choose the Nonpersistent radio button for the Mode option, as shown in Figure 15-7. This mode allows you to execute the rogue code in a “fresh” installation of an operating system.
Figure 15-7. Setting VMware to Nonpersistent mode
Make sure that the test system is not connected to the Internet. You do not want to execute or install rogue code when connected to the Internet (or any network). Some illicit applications send “beacon packets,” or phone home. You may be alerting the attackers that you have both acquired and executed their attack tools.
If you suspect the rogue code may create or respond to network traffic, it is a good idea to execute it on a closed network. Monitor the closed segment with a sniffer running on a separate system on the closed network. Closed means that no systems you care about are on this network.
GO GET IT ON THE WEB
Eye Witness Report
We got quite a scare when we were performing tool analysis on a program found at a military site. The file was placed on the system by an international attacker. We did not want to alert this attacker that we were both sniffing his connections and retrieving and analyzing his tools. As it turned out, his tools were mostly homegrown, and their functions were rather complex. We obtained one tool that held our attention until the early hours on a Saturday morning. We were performing dynamic tool analysis, and decided to run the tool for the first time. As soon as we ran the tool, we noticed a packet was generated on the network that appeared on our network monitor. I raced to the T-g1 line on the wall to pull the cable and terminate our Internet connection. Luckily, we had already done that! The software produced a beacon packet that could have alerted the attacker that we had run his tool. He would have at least obtained our IP address, and that would have been bad!
Dynamic Analysis on a Unix System
Most applications execute in a memory area defined as user space. User space applications are typically prohibited from accessing computer hardware and resources directly. These resources are controlled by the kernel to enforce security, maintain nonconcurrent use, and provide stability of the operating system. User applications access these resources by requesting the kernel to perform the operations on its behalf. The user application makes these requests to the kernel via system calls.
Using Strace
Unix has a tool that traces the use of system calls by an executed process. This tool, called strace
(system trace), is essentially a wiretap between a program and the operating system. The strace
command displays information about file access, network access, memory access, and many other system calls that a file makes when it is executed.
CAUTION
Remember that when you use strace
, you execute the rogue code. Therefore, it is important to use a stand-alone workstation (with no outside network connectivity) that you do not mind altering (or even crashing).
Here is an example of executing the strace command:
This command line will store the interaction between the zap program and the operating system in a file called strace.out. Remember that the zap program will execute, performing its nefarious operations.
The following is a review of the strace.out file. For the sake of expediency, you can ignore every line before line 19, the getpid
call. All lines that precede the getpid
system call are standard for setting up the proper environment for the process to execute. The line numbers were added by the authors for easy review.
Oversimplifying a bit, line 23 is our biggest clue of what took place when we ran the command ./zapdynamic
. An error message of seven characters, “Error.
” (
signifies a new line), was printed to file descriptor 1. File descriptor 1 is used as standard output, which is usually the terminal or the console a user is viewing. Thus, the word Error was printed to the screen. A valid conclusion would be that we did not have the proper command-line arguments to make zap run properly.
NOTE
File descriptors are nonnegative integers that the operating system (kernel) uses to reference the files being accessed by a process. File descriptors 0, 1, and 2 are the predefined file descriptors for standard input, standard output, and standard error, respectively. When the kernel opens, reads, writes, or creates a file or network socket, it returns a file descriptor (integer) that is used to reference the file or network socket.
Examining Strace Output Since zap erases a specific user’s entries from the utmp, wtmp, and lastlog files, a logical conclusion would be that the command line contains that specific user’s username. Therefore, we can execute the strace
program again with a proper command line. Let’s examine the output and see how it can be used to analyze the zap program.
Lines 1 through 18 are the system calls done by the operating system to set up the environment needed for the process to execute. These calls work as follows:
The execve
call in line 1 shows the command-line arguments.
The brk
system calls are used to allocate memory for the process.
The mmap
calls map a portion of a file into memory. This is typically done when loading runtime libraries when a process is initially executed.
The fstat
call obtains information about the file that is referenced by the file descriptor. fstat
can return the time/date stamps for a file, the owner of a file, the size of a file, the number of hard links, and practically any information needed by the program to access the file.
The close
system calls are used to release a file descriptor when the process no longer needs the file or socket referenced. For example, in line 16, file descriptor 4 is closed. This releases file descriptor 4, allowing it to be reassigned during the next system call that requires a file handle (such as open
or mmap
).
Everything above line 19, the getpid
system call, is basically standard for all dynamically linked ELF executables.
The operations specific to the zap program begin after the getpid
system call in line 19. Each running process gets a unique process ID from the getpid
call. Notice that the process running received a process ID of 618. In line 23, a Unix socket is opened for transferring information between processes. Do not mistake this for a network socket! Unix sockets are opened when a process wants to exchange information with another running process.
The process is looking for authentication or host lookup information in lines 27 through 30. In line 27, the /etc/nsswitch.conf file is successfully opened. Typically, reading the nsswitch.conf file suggests the program will read the /etc/passwd file as well.
In line 46, the zapdynamic program opens the /etc/passwd file as file descriptor 4. Notice that the /etc/passwd file was opened read-only, as indicated by the O_RDONLY
argument.
In line 51, the zapdynamic program reads the entry for user root
in file descriptor 4, which is the /etc/passwd file. Then it closes file descriptor 4 in line 52.
In line 54, the zapdynamic program opens the file /var/log/lastlog as file descriptor 4. Notice that it opens /var/log/lastlog for read and write access, as indicated by the O_RDWR argument.
In line 56, the zapdynamic program writes