Output

Adding these into the CUP specification produces nice-looking results (see Listing 6-39).

DoWhile.java and IfTest.java

The next extension of the decompiler allows it to handle conditional statements. Two types of conditional statements do not require the goto opcode by nature—the do-while loop (Listing 6-40) and the if statement (Listing 6-41)—so it is logical to start with them. This small change requires the addition of several new opcodes, the most complicated covered so far.

Both main functions have only a single control-flow statement to deal with. Their organizations are exactly as you would expect. In the do-while loop, all statements are executed until the conditional test is reached. If it is true, the program branches back to near the beginning; if not, it continues. In the if statement, the conditional is checked and the program branches or continues accordingly.

Be careful here—you need to make sure the decompiler checks whether the line number to which the conditional branches falls before or after the current line. If it falls before the current line, the conditional is a do-while; if it falls after, it is either an if statement or a while or for loop. The opcodes if and cmp always go together; if_icmp is a more efficient combination of the two operations that is used exclusively for integer comparisons. The two can be decompiled the same way, however.

Input

The main function of DoWhile.java is shown in Listing 6-42.

Note  From now on, the <Line> and </Line> tags are omitted for brevity.

The main function of IfTest.java is shown in Listing 6-43. As the bytecode of the main function shows (lines 8–17), the double conditional becomes two nested comparison statements.

Grammar

The grammar is as follows:

expr_part -> error | return | store | load  | invoke | object | const 
                         | bipush | conv| arith | iinc | if | cmp | if_icmp;
if_icmp -> number IF_ICMP number;
if ->  number IF number;
cmp  -> number type CMP;

The if_icmp Non-Terminal

Each of the new productions look simple and if_icmp is no exception. It consists only of the line number, the IF_ICMP terminal, and the branch location.

The first condition to test is whether the branch location is less than or greater than the current line number. If it’s less, the loop is a do-while and the end of the loop has been reached. Now the decompiler must move back through the finalStack (popping and pushing lines onto the tempStack) until the branch location is greater than or equal to the current line number.

At this point, it injects “do {“ into the finalStack, restores the contents of the finalStack, and outputs the actual while statement. Then the top two items on the oStack are compared. The first item on the oStack, also the rightmost in the while statement, is popped first and stored in a temporary variable. The second is then popped and the while statement is pushed onto the finalStack.

finalStack.push(space + “} while (“ + oStack.pop().toString() + “>=” 
                                                  + temp.toString () +”);”);

If the branch location is greater than the current line number, however, the decompiler will assume an if statement and push it to the oStack. The goto production, which will be covered later on, determines if that assumption is correct.

The if Non-Terminal

The if production consists of the line number, the IF terminal, and the branch location.

Resolution of the if opcode is similar, but the difference is important: if_icmp compares the top two oStack elements, while if merely compares the top element against zero. It is often used in conjunction with cmp to compare two values. Because cmp can only do less-than and greater-than comparisons, it is not sufficient to set up the if statement conditionals. As a result, the decompiler ignores the comparison type (g or l) of the cmp statement and merely preserves the names of the compared values.

As you see in Listing 6-44, the conditional type is resolved in opposite ways in a do-while loop and in an if statement (or forward-branching loop). For example, if_icmpge becomes >= in a do-while loop but < in an if statement. The reason for this lies in the different constructions of do-while statements and if statements. In a do-while statement, the JVM branches back to the beginning if the conditional statement is true. In an if statement, it doesn’t branch if the statement is true—only if the statement is false. Therefore the actual conditions being tested in each case are opposite, and so are the comparison signs.

The cmp Non-Terminal

The cmp production consists of the line number, the type of the values being compared, and the CMP terminal (which can be either cmpg or cmpl).

The non-terminal for cmp is self-explanatory: the decompiler checks whether the comparison is greater-than or less-than and pushes the comparison onto the oStack.

Code

The complete code to handle IfTest.java is shown in Listing 6-44.

Output

By recompiling our specifications and running DoWhile and IfTest through them, we obtain exactly what we’d like, as shown in Listings 6-45 and 6-46.

Recurses.java

Now that if statements are implemented, the decompiler can process a simple example of recursion, Recurses.java. Although this program (see Listing 6-47) does not introduce any new non-terminals, it demonstrates the difference between invocation of static methods and virtual methods and how if-else statements work.

Input

Static invocations, by their nature, can’t be chained with any other functions and don’t require any references to external classes. If you check the chain of conditionals in the invoke non-terminal, you will see that the invocation on line 5 (Listing 6-48) fails every one and is pushed onto the oStack with its argument by the final else.

The major point of interest in the recursion function is the conditional statement in line 1, see Listing 6-49. Because the comparison is against zero, cmp is not required. Since the whole body of the function consists of the two cases (if and else), the else is optional and the decompiler omits it.

Grammar

The grammar is unchanged from the IfTest.java test.

Code

No new code is required.

Output

As we’d hope, the decompiler produces correct results as shown in Listing 6-50.

WhileLoop.java

The next step is to build on the conditional statement resolution to resolve normal while and for loops. This means introducing the goto opcode, which makes resolving the more complex conditionals possible, though resolving it correctly remains difficult. We’ll be looking at a “simple” loop in Listing 6-51—a standard while loop.

Input

The main function of WhileLoop.java is shown in Listing 6-52.

Grammar

The grammar is as follows:

expr_part -> error | return | store | load  | invoke | object | const | 
                        bipush | conv | arith | iinc | if | cmp | if_icmp | goto;

goto -> number GOTO number;

The goto Non-Terminal

The goto production consists only of the line number, the GOTO terminal, and the branch location.

The goto non-terminal is somewhat formidable. It serves two major purposes: it can be used in conditional loops, or it can be used in if - else if - else statements. In order to check which context it’s being executed in, the decompiler needs to check the current line number and compare it to the branch address. If the branch line number is greater than the current line number, the expression is an if - else if - else sort statement and the branch line number is pushed to the gotoStack (much the same way the branch location was treated in the forward-branching cases of the if and if_icmp statements). If it is less than the current line number, it is a conditional statement of some sort. This is the important context for the time being.

The decompiler then pops finalStack items and pushes them onto the new tempStack, as it did for do-while statements, until it finds a line number less than or equal to the branch address. It then checks to see if the line it has found is the if statement that matches the goto statement. If it is not, it pushes items from the tempStack back onto the finalStack until it is located, then it pops the if statement, trims off everything except for the conditional expression, and stores the conditional statement in a String variable.

Once the loop is done, the decompiler decrements the conditional depth counter, shortens the spacing, pushes the final lines of code onto the finalStack, and pops the now-resolved branch address from the ifStack.

Code

The complete code to handle WhileLoop.java is shown in Listing 6-53.

Output

The decompiler now produces the expected results, as shown in Listing 6-54.

ForLoop.java

In the beginning there was the while loop, and then a language designer said, “Hey, let’s incorporate this initialize-and-increment stuff into a new conditional! Hell, let’s call it a for loop!” And he looked and he saw it was good.

The next problem is extending the implementation of the while loop in WhileLoop.java to cover for loops. This is done very easily, as you can see in Listing 6-55.

This may look the same as WhileLoop.java, and, in fact, it is. However, it meets the criteria for a for loop:

  • The line before the while statement assigns a variable within the while loop.
  • The last line inside the while loop reassigns or modifies the value of a variable.

That said, it’s now obvious that the program can be decompiled to the following:

for  (String output = “outoutoutput”; output.indexOf(“out”)!=-1; 
                           output = output.substring(3))
           System.out.println(output.substring(3));

Input

The main function of ForLoop.java is shown earlier in Listing 6-52.

Grammar

The grammar remains unchanged from WhileLoop.java.

The goto Non-Terminal, Redux

The goto non-terminal is the only one that needs modification. At the beginning of the first case, the decompiler must test for an assignment, an increment, or a decrement in the top finalStack item. If any of these are true, it assumes a for loop, pops that top item, and stores it in a temporary variable.

The next necessary change is near the end. Where the decompiler once could get away with pushing a while statement and restoring the finalStack, it must now test the forOrWhile boolean and proceed accordingly, then test to ensure the element before the conditional statement is an assignment. If it is not an assignment, it cancels the for loop treatment, pushes a while loop as before, and restores the temp variable to the finalStack. If it is, it must be trimmed and used as the assignment portion of the for loop. Then the decompiler can push the for statement and the body of the loop to the finalStack.

Code

The complete code to handle ForLoop.java is shown in Listing 6-56.

Output

Decompiling ForLoop now gives the expected for loop version (see Listing 6-57).

ArrayTest.java

And now for something completely different. After the long, hard slog of conditionals, arrays are easy. The initial sample program, see Listing 6-58 will use a for loop to load a three-element string array with the cubes of the array indexes. This requires three new array operations: array loading, array storing, and array initialization.

Input

The ClassToXML input for the ArrayTest.java is shown in Listing 6-59.

Grammar

The grammar is as follows:

expr_part -> error | return | store | load  | invoke | object | const | bipush 
               | conv | arith | iinc | if | cmp | if_icmp | goto | arrayops;

arrayops  -> aload | astore | newarray;
aload -> number ALOAD;
astore ->  number ASTORE;
newarray  -> number NEWARRAY number;

The aload Non-Terminal

The aload production consists only of the line number and the ALOAD terminal. Because no argument is required, you might guess that everything is done using the oStack. You’d be right; the JVM loads the data at the index given by the top oStack element from the array given by the next-to-top oStack element. If this seems confusing, see the code below, see Listing 6-60.

The astore Non-Terminal

The astore production consists only of the line number and the ASTORE terminal. Again, no argument is required. The top element on the oStack is stored at the index given by the next-to-top oStack element in the array given by the second-to-top oStack element.

The newarray Non-Terminal

The newarray production consists of the line number, the NEWARRAY terminal, and the type of the new array. There are two forms of this opcode: newarray and anewarray. The newarray opcode is used to initialize primitive-typed arrays. Its argument, which ranges from 4 to 11, specifies the primitive type. The anewarray opcode is used for higher-level arrays; its argument is, as you’d expect, a constant pool reference to the name of the class or interface the array is composed of. It can also be used with multianewarray to create a multidimensional array, but this decompiler does not implement that.

Code

The complete code to handle ArrayTest.java is shown in Listing 6-60.

Output

Recompiling the CUP spec and running the XML through the decompiler recovers the original program correctly, as shown in Listing 6-61.

ArrayInit.java

Finally, the decompiler will be extended to handle initialized arrays. To extend the decompiler, the remaining three opcodes from the object non-terminal are needed: putstatic, getfield, and putfield.

This example (see Listing 6-62) is also the first time the field-parsing non-terminal is used. Initialization of variables stored in static (class) fields is done using a special initialization method called <clinit>. Initialization of other field variables is done in the <init> method, which up to now had been ignored.

Input

The introduction of initialized fields has two effects on the code section of the XML file: first, it causes the <init> method, which was previously ignored, to become useful, since it is used to initialize a; second, it creates a new <clinit> method, which is used to fill the arr[] field and the mork variable (see Listing 6-63).

The initialization of arrays is done in a specific manner, which is shown in lines 5–61 of Listing 6-64. This produces a finalStack that looks like the following:

new int[10][0]=1;
new int[10][1]=8;
new int[10][2]=27;
new int[10][3]=64;
new int[10][4]=125;
new int[10][5]=216;
new int[10][6]=343;
new int[10][7]=512;
new int[10][8]=729;
new int[10][9]=1000;

Although the syntax is incorrect, this gives you some idea of how the structure looks in memory.

The main method poses few surprises and does not expose any new concepts, so we’ll get straight to the new stuff.

Grammar

The grammar is as follows:

object -> number GETSTATIC number | number GETFIELD number | number
                PUTSTATIC number | number PUTFIELD number;

The getfield Non-Terminal

The getfield production consists of the line number, the GETFIELD terminal, and the constant pool index of the field to access.

The getfield opcode can be resolved in exactly the same manner as getstatic.

The putfield Non-Terminal

The putfield production consists of the line number, the PUTFIELD terminal, and the constant pool index of the field to access.

To resolve putfield, the decompiler first checks to see whether the top item on the oStack is a new array. If the sum of the indices of [ and ‘] in the top item is greater than six (this eliminates all non-new array elements), it is regarded as an array assignment. The decompiler reads in the array length, pops that number of items from the oStack, trims off everything but the assigned values, and produces a curly-bracketed, comma-delimited result.

If the current method name is <init>, the result must be stored in the fieldStack. The decompiler pops items from the fieldStack and pushes them to a temporary stack until it finds the proper field name. It then inserts the assignment operation and restores the fieldStack. If the current method name is not <init>, the result can be stored in the finalStack.

The putstatic Non-Terminal

The putstatic production consists of the line number, the PUTSTATIC terminal, and the constant pool index of the field to access.

Resolution of putstatic is identical to that of putfield, but the current method must be <clinit> for the result to be stored in the fieldStack.

Code

The complete code to handle ArrayInit.java is shown in Listing 6-65.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.151