Chapter 19. IDA Processor Modules

image with no caption

The last type of IDA modules that can be built with the SDK are processor modules, which are by far the most complex of IDA’s module types. Processor modules are responsible for all of the disassembly operations that take place within IDA. Beyond the obvious conversion of machine language opcodes into their assembly language equivalents, processor modules are also responsible for tasks such as creating functions, generating cross-references, and tracking the behavior of the stack pointer. As it has done with plug-ins and loaders, Hex-Rays has made it possible (beginning with IDA 5.7) to author processor modules using one of IDA’s scripting languages.

The obvious case that would require development of a processor module is reverse engineering a binary for which no processor module exists. Among other things, such a binary might represent firmware images for embedded microcontrollers or executable images pulled from handheld devices. A less-obvious use for a processor module might be to disassemble the instructions of a custom virtual machine embedded within an obfuscated executable. In such cases, an existing IDA processor module such as the pc module for x86 would help you understand only the virtual machine itself; it would offer no help at all in disassembling the virtual machine’s underlying byte code. Rolf Rolles demonstrated just such an application of a processor module in a paper posted to OpenRCE.org.[135] In Appendix B of his paper, Rolf also shares his thoughts on creating IDA processor modules; this is one of the few documents available on the subject.

In the world of IDA modules, there are an infinite number of conceivable uses for plug-ins, and after scripts, plug-ins are by far the most commonly available third-party add-ons for IDA. The need for custom loader modules is far smaller than the need for plug-ins. This is not unexpected, as the number of binary file formats (and hence the need for loaders) tends to be much smaller than the number of conceivable uses for plug-ins. A natural consequence is that outside of modules donated to and distributed with IDA, there tend to be relatively few third-party loader modules published. Smaller still is the need for processor modules, as the number of instruction sets requiring decoding is smaller than the number of file formats that make use of those instruction sets. Here again, this leads to an almost complete lack of third-party processor modules other than the few distributed with IDA and its SDK. Judging by the subjects of posts to the Hex-Rays forums, it is clear that people are working on processor modules; these modules are simply not being released to the public.

In this chapter, we hope to shed additional light on the topic of creating IDA processor modules and help to demystify (at least somewhat) the last of IDA’s modular components. As a running example, we will develop a processor module to disassemble Python byte code. Since the components of a processor module can be lengthy, it will not be possible to include complete listings of every piece of the module. The complete source code for the Python processor module is available on the book’s companion website. It is important to understand that without the benefit of a Python loader module, it will not be possible to perform fully automated disassembly of compiled .pyc files. Lacking such a loader, you will need to load .pyc files in binary mode, select the Python processor module, identify a likely starting point for a function, and then convert the displayed bytes to Python instructions using Edit ▸ Code.

Python Byte Code

Python[136] is an object-oriented, interpreted programming language. Python is often used for scripting tasks in a manner similar to Perl. Python source files are commonly saved with a .py extension. Whenever a Python script is executed, the Python interpreter compiles the source code to an internal representation known as Python byte code.[137] This byte code is ultimately interpreted by a virtual machine. This entire process is somewhat analogous to the manner in which Java source is compiled to Java byte code, which is ultimately executed by a Java virtual machine. The primary difference is that Java users must explicitly compile their Java source into Java byte code, while Python source code is implicitly converted to byte code every time a user elects to execute a Python script.

In order to avoid repeated translations from Python source to Python byte code, the Python interpreter may save the byte code representation of a Python source file in a .pyc file that may be loaded directly on subsequent execution, eliminating the time spent in translating the Python source. Users typically do not explicitly create .pyc files. Instead, the Python interpreter automatically creates .pyc files for any Python source module that is imported by another Python source module. The theory is that modules tend to get reused frequently, and you can save time if the byte code form of the module is readily available. Python byte code (.pyc) files are the rough equivalent of Java .class files.

Given that the Python interpreter does not require source code when a corresponding byte code file is available, it may be possible to distribute some portions of a Python project as byte code rather than as source. In such cases, it might be useful to reverse engineer the byte code files in order to understand what they do, just as we might do with any other binary software distribution. This is the intended purpose of our example Python processor module—to provide a tool that can assist in reverse engineering Python byte code.



[135] See “Defeating HyperUnpackMe2 With an IDA Processor Module” at http://www.openrce.org/articles/full_view/28

[137] See http://docs.python.org/library/dis.html#bytecodes for a complete list of Python byte code instructions and their meanings. Also see opcode.h in the Python source distribution for a mapping of byte code mnemonics to their equivalent opcodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.199.56