So far, we have discussed tools that perform a cursory analysis of files based on minimal knowledge of those files’ internal structure. We have also seen tools capable of extracting specific pieces of data from files based on very detailed knowledge of a file’s structure. In this section we discuss tools designed to extract specific types of information independently of the type of file being analyzed.
It is occasionally useful to ask more generic questions regarding file content, questions that don’t necessarily require any specific knowledge of a file’s structure. One such question is “Does this file contain any embedded strings?” Of course, we must first answer the question “What exactly constitutes a string?” Let’s loosely define a string as a consecutive sequence of printable characters. This definition is often augmented to specify a minimum length and a specific character set. Thus, we could specify a search for all sequences of at least four consecutive ASCII printable characters and print the results to the console. Searches for such strings are generally not limited in any way by the structure of a file. You can search for strings in an ELF binary just as easily as you can search for strings in a Microsoft Word document.
The strings
utility is designed specifically to extract string content from files, often without regard for the format of those files. Using strings
with its default settings (7-bit ASCII sequences of at least four characters) might yield something like the following:
idabook# strings ch2_example
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
exit
srand
puts
time
printf
stderr
fwrite
scanf
__libc_start_main
GLIBC_2.0
PTRh
[^_]
usage: ch2_example [max]
A simple guessing game!
Please guess a number between 1 and %d.
Invalid input, quitting!
Congratulations, you got it in %d attempt(s)!
Sorry too low, please try again
Sorry too high, please try again
Unfortunately, while we see some strings that look like they might be output by the program, other strings appear to be function names and library names. We should be careful not to jump to any conclusions regarding the behavior of the program. Analysts often fall into the trap of attempting to deduce the behavior of a program based on the output of strings
. Remember, the presence of a string within a binary in no way indicates that the string is ever used in any manner by that binary.
Some final notes on the use of strings
:
When using strings
on executable files, it is important to remember that, by default, only the loadable, initialized sections of the file will be scanned. Use the -a
command-line argument to force strings
to scan the entire input file.
strings
gives no indication of where, within a file, a string is located. Use the -t
command-line argument to have strings
print file offset information for each string found.
Many files utilize alternate character sets. Utilize the -e
command-line argument to cause strings
to search for wide characters such as 16-bit Unicode.
As mentioned earlier, a number of tools are available to generate dead listing–style disassemblies of binary object files. PE, ELF, and Mach-O binaries can be disassembled using dumpbin
, objdump
, and otool
, respectively. None of those, however, can deal with arbitrary blocks of binary data. You will occasionally be confronted with a binary file that does not conform to a widely used file format, in which case you will need tools capable of beginning the disassembly process at user-specified offsets.
Two examples of such stream disassemblers for the x86 instruction set are ndisasm
and diStorm
.[15] ndisasm
is a utility included with the Netwide Assembler (NASM).[16] The following example illustrates the use of ndisasm
to disassemble a piece of shellcode generated using the Metasploit framework.[17]
idabook#./msfpayload linux/x86/shell_findport CPORT=4444 R > fs
idabook#ls -l fs
-rw-r--r-- 1 ida ida 62 Dec 11 15:49 fs idabook#ndisasm -u fs
00000000 31D2 xor edx,edx 00000002 52 push edx 00000003 89E5 mov ebp,esp 00000005 6A07 push byte +0x7 00000007 5B pop ebx 00000008 6A10 push byte +0x10 0000000A 54 push esp 0000000B 55 push ebp 0000000C 52 push edx 0000000D 89E1 mov ecx,esp 0000000F FF01 inc dword [ecx] 00000011 6A66 push byte +0x66 00000013 58 pop eax 00000014 CD80 int 0x80 00000016 66817D02115C cmp word [ebp+0x2],0x5c11 0000001C 75F1 jnz 0xf 0000001E 5B pop ebx 0000001F 6A02 push byte +0x2 00000021 59 pop ecx 00000022 B03F mov al,0x3f 00000024 CD80 int 0x80 00000026 49 dec ecx 00000027 79F9 jns 0x22 00000029 52 push edx 0000002A 682F2F7368 push dword 0x68732f2f 0000002F 682F62696E push dword 0x6e69622f 00000034 89E3 mov ebx,esp 00000036 52 push edx 00000037 53 push ebx 00000038 89E1 mov ecx,esp 0000003A B00B mov al,0xb 0000003C CD80 int 0x80
The flexibility of stream disassembly is useful in many situations. One scenario involves the analysis of computer network attacks in which network packets may contain shellcode. Stream disassemblers can be used to disassemble the portions of the packet that contain shellcode in order to analyze the behavior of the malicious payload. Another situation involves the analysis of ROM images for which no layout reference can be located. Portions of the ROM will contain data, while other portions will contain code. Stream disassemblers can be used to disassemble just those portions of the image thought to be code.
18.191.192.59