Document-based malware is becoming increasingly common. Malicious PDF files are one example of document files designed to exploit vulnerabilities in document-viewing software. Analyzing malicious PDF files (or any document files for that matter) requires that you understand the structure of the file you are analyzing. In dissecting the structure of such a file, your goal is often to discover any embedded code that may get executed if the document is successfully utilized to compromise a computer used to view it. The few PDF analysis tools that exist are primarily targeted at the command-line user with the goal of facilitating the extraction of information that might ultimately be loaded into IDA for further analysis.
Name | IdaPdf |
Author | Chris Eagle |
Distribution | C++ source |
Price | Free |
Description | PDF loader and plug-in for dissecting and navigating PDF files |
Information |
IdaPdf consists of an IDA loader module and an IDA plug-in module, each designed to facilitate the analysis of PDF files. The loader component of IdaPdf recognizes PDF files and loads them into a new IDA database. The loader takes care of breaking the PDF into its individual components. During the loading process, the loader makes every attempt to extract and filter all PDF stream objects. Since loader modules get unloaded once the load process is complete, a second component, the IdaPdf plug-in, is required in order to provide PDF analysis capabilities beyond the initial loading. The plug-in module, upon recognizing that a PDF file has been loaded, proceeds to enumerate all of the PDF objects contained within the file and opens a new tabbed window containing a list of every object within the PDF. The following listing is representative of the type of information contained in the PDF Objects window.
Num Location Type Data Offs Data size Filters Filtered stream Filtered size Ascii 17 000e20fe Stream 000e2107 313 /FlateDecode 000f4080 210 No 35 00000010 Dictionary 00000019 66 Yes 36 000002a3 Dictionary 000002ac 122 Yes 37 0000032e Stream 00000337 470 [/FlateDecode] 000f4170 1367 Yes
The listing shows object numbers along with the location of the object, the object’s data, any filters that must be applied to stream objects, and a pointer to the extracted, unfiltered data. Context-sensitive menu options allow for easy navigating to view either the object data or any extracted filtered data. The opportunity to extract object data, either raw or filtered, is also made available via context-sensitive menu options. The Ascii column indicates the plug-in’s best-effort opinion as to whether the object contains only ASCII data in its raw or filtered versions.
The last features implemented by IdaPdf are exposed through the addition of two new menu options under Edit ▸ Other when IdaPdf is launched. These menu options allow you to highlight a block of data in the database and then ask the plug-in to Base64 decode the data or unescape[215] the data, with the results being copied into a newly created section within IDA. Such uncoded data will often turn out to be the malicious payload contained within the PDF. Since the plug-in extracts this data to a new IDA segment, it is fairly straightforward to navigate to the extracted data and ask IDA to disassemble some or all of it.
3.144.97.126