Output

Decompiling the class file using this new CUP spec returns the original program, see Listing 6-66. Note that, due to the implementation of the fieldStack, the order of the fields in the decompiled program is reversed. Obviously, this does not affect program execution.

Summarizing Decompiler Implementation

It’s now time to take stock, take a step back and see exactly where we are. Read on and we’ll explore this more.

What We Have

We now have a method for decompiling Java classes. Our lexical analyzer cuts ClassToXML’s output into usable tokens for our parser to digest, and our parser returns something close to the original source of the program. How robust and how complete is our decompiler, though?

What Remains

Unfortunately, our decompiler is not very robust at present. There are opcodes we do not parse and facets of the class file structure that are not dealt with, such as interfaces and the exception tables. Although these are important in a full-scale decompiler, their inclusion here would occupy many, many more pages. We will review the remaining opcodes briefly and I will present hints and tips for decompiling them.

Remaining Opcodes

The first group of remaining opcodes are low-level JVM operations. The first opcode, NOP, is the most useless (literally). Unless you want to do estimates of the JVM’s clock speed, you can just dump this command, which causes no operation to be performed.

For our purposes, WIDE can also be discarded. This opcode specifies that the next local variable referenced is 16 bits rather than 8 bits. Obviously, this doesn’t affect our particular JVM.

The next pair are also very simple. POP and POP2 merely pop and discard the top word and double word on the oStack, respectively. Implementation is trivial.

SHL and SHR are identical and opposite operations. The top word is popped and shifted left and right by the number of digits specified by the five lower bits of the next word, and the result is pushed back onto the oStack.

Next we have the three logical operators: AND, OR, and XOR. These perform the respective bitwise operations between the top two values on the oStack. Implementation for all three is simple.

Next, we have subroutine commands. JSR jumps to a local subroutine defined within a method; practically speaking, it implements finally. It first pushes the contents of the program counter plus three onto the oStack, then it branches to the program counter value plus an offset provided by the argument of the opcode. This is a trickier command; implementation is similar to that of GOTO.

RET returns from a subroutine. It just loads an address from a specified local variable and stores it in the program counter. Implementation is, again, tricky.

The next group consists of opcodes similar to those we have already implemented. The DUPX instruction is very similar to DUP, but it inserts the top word beneath the next item on the oStack. SWAP is another stack operation, which merely swaps the top two words on the oStack. Implementation is similar to DUP but will require a temporary storage area for stack items.

MULTIANEWARRAY (surprisingly enough) initializes a multidimensional array. The opcode itself takes two arguments: the type and the number of dimensions being created. It then pops the top element of the oStack, which is the number of dimensional sizes stored on the oStack. Then come the dimensional sizes themselves; these specify the number of array elements in each dimension. This multidimensional array is a complicated construction, but it should be tractable if you’ve gotten this far.

ARRAYLENGTH is a much simpler array command that returns the length of a given array in memory. Resolution is not too difficult.

TABLESWITCH and LOOKUPSWITCH are similar operations that use offset tables, allowing for computed jumps. The former branches to the table entry whose index is given by the object on top of the oStack; the latter actually does a comparison of values to choose the branch location and is thus produced by most switch statements.

ATHROW, as you might expect, throws an exception (see the discussion of exception handling tables momentarily) and checks for a handler. It’s a very involved instruction and one you could commonly find in programs that use try/catch/finally statements.

Next, we have some simple procedural operations. CHECKCAST checks whether the top oStack element can be cast to a different type. INSTANCEOF checks whether an object or array is a member of a particular class. Both are fairly simple, in that there’s no other way to do what they do. Decompiling them, however, may be involved.

Finally, we have two extremely particular commands: MONITORENTER and MONITOREXIT. These are used in multithreaded programs to lock and to release the thread’s access to an object. Multithreaded operations are also beyond the scope of this book.

Exception Handling Tables

ClassToXML ignores the exception-handling tables that belong to each method. These come into use when we use try/catch/finally statements. They are essentially another form of conditional statement, and resolution is very similar. One major catch, however, is that the tables are stored after the method code in the classfile—in order to process them using our parser, we would have to use a much more complicated program or change the order of information within the classfile when outputting it to XML.

Other Problems of Decompilation

A full-featured decompiler would have to deal with the interfaces directly. It would have to apply a robust, unshakeable, and powerful conditional resolution to defeat control-flow obfuscators—this is the weakest point of decompilers from .NET’s Anakrino to Java’s Mocha. It’s much easier to know where a program needs to go and to make it difficult to get there than it is to reverse the changes in the flow of the program.

Many of the remaining things in the classfile that we ignore don’t hurt us. The line number table—which is not even present in final builds—is pretty much useless for our purposes. So too are most of the other possible attributes—SourceFile, Synthetic, LocalVariableTable, and Deprecated with the possible exception of InnerClasses.

Conclusion

The complete code for our decompiler, XMLToSource, is available on the Apress website (http://www.apress.com). I plan to add new keywords and constructs over time as the Java language evolves past JDK 1.5. I will also add new constructs occassionally to make the decompiler much more robust and complete. I welcome any reader contributions to help in this effort.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.189.251