Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 17. Understanding and Tweaking Bytecode

	“Every solution breeds new problems.”
	--Murphy's Fifth Corollary

IN THIS CHAPTER

Bytecode Fundamentals 165
Viewing Class Files Using the jClassLib Bytecode Viewer 166
The JVM Instruction Set 167
Class File Format 168
Instrumenting and Generating Bytecode 174
Bytecode Tweaking Compared with AOP and Dynamic Proxies 181
Quick Quiz 182
In Brief 182

Bytecode Fundamentals

Chapter 2, ”Decompiling Classes,” presented a brief overview of bytecode and the purpose it serves in Java. As you undoubtedly know, the bytecode is the intermediate step between the source code and the machine code, which enables cross-platform execution of the Java programs. The bytecode is defined by the Java Virtual Machine Specification (http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html), which also describes the language concepts, the class file format, the Java Virtual Machine (JVM) requirements, and other important aspects of the Java programming language. Strict adherence to the specification ensures the portability and ubiquitous execution of applications compiled into bytecode. The JVM running on top of the operating system is responsible for providing the execution environment and converting the Java bytecode instructions into native machine instructions.

Most of the hacking techniques presented earlier in this book required obtaining and manipulating the source code to alter an application behavior. In this chapter we will work at the bytecode level rather than the source code level. We will discover how to view the class file data structures, instrument (enhance) the existing bytecode, and programmatically generate new classes. Here are some of the benefits of making changes at the bytecode level:

You don't need to obtain the source code or decompile the bytecode and then recompile the source later.
Bytecode can be generated or instrumented by a class loader on-the-fly as the classes are loaded into a JVM.
It is easier and faster to automate bytecode generation than source code generation because fewer steps are involved and the compiler doesn't need to be executed. For example, Hibernate generates the persistence code for Java classes at runtime.
Tools can rely on bytecode instrumentation to introduce additional logic that does not need to be present in the source files. Some implementations of Aspect Oriented Programming (AOP), for instance, insert custom attributes into the bytecode and instrument the methods to support AOP.

The next two sections present a brief introduction to the aspects of the JVM specification that are related to bytecode. Although it is useful to familiarize yourself with how the JVM operates and the format of the class file, it is not strictly necessary for implementing the techniques presented in this chapter. If you are not known to be patient and reading specification-like material is comparable to writing end user documentation for your code, feel free to skip the next two sections and go directly to the section titled “Instrumenting and Generating Bytecode.”

Viewing Class Files Using the jClassLib Bytecode Viewer

The Bytecode Viewer shipped with the free jClassLib library is an excellent GUI utility that enables browsing the content of the class file. It shows a hierarchical view of the file structure in the left pane and the content of the selected element in the right pane. Figure 17.1 shows jClassLib displaying the content of SimpleClass from the covertjava.bytecode package.

Figure 17.1. The jClassLib Bytecode Viewer.

The jClassLib Bytecode Viewer does not allow modifications of the class file, but it is great for visualizing the structures that are presented in the next sections. A useful way of learning about the bytecode is by comparing the instructions in the bytecode with the statements and operators in the source code. The viewer can also be used to debug the generation and instrumentation of the bytecode that we will perform at the end of this chapter.

The JVM Instruction Set

Java source files are compiled into the binary class files, which follow a specific format. The logic of each Java method is represented with a set of primitive JVM instructions defined in the JVM specification. JVM instructions are basic commands that are similar to the machine code. Each JVM instruction consists of an operation code (opcode) followed by zero or more operands representing the parameters of the operation. In the class file, the instructions are stored as a binary stream representing the Code attribute of a method. The opcode is stored as 1 byte, which can be followed by the bytes representing the operands data. For example, the source code shown in Listing 17.1 is represented by the set of instructions shown in Listing 17.2.

STORIES FROM THE TRENCHES

Hibernate is a free high-performance object/relational persistence and query service for Java. One of the biggest selling points for Hibernate is its capability to transparently persist Java objects. Instead of coding tedious JDBC calls, developers write an XML file of object mapping to a database schema and Hibernate provides all the plumbing. The persistent service draws on reflection and runtime bytecode generation to ensure that it does not impact upon IDE debugging and incremental compile. Hibernate touts how using Apache's Byte Code Engineering Library—and later the CGLIB bytecode generation library—to manipulate the bytecode allows it to avoid the overhead of Java reflection API.

Example 17.1. Sample Java Source Code

int i = 0;
i = i + 1;
System.out.println(i);

Example 17.2. Bytecode Representation of Sample Source Code

 0 iconst_0
 1 istore_1
 2 iinc 1 by 1
 5 getstatic #21 <java/lang/System.out>
 8 iload_1
 9 invokevirtual #27 <java/io/PrintStream.println>
12 return

Most of the instructions are very simple, and tracing the instructions back to the source code they represent is easy. For instance, iconst_0 defines an integer constant with a value of 0, and istore_1 stores a value from the top of the stack (0 in our case) into a local variable specified by an index (i in our case). A more interesting scenario is a method call. As you can see from the listings, the name of the static class field (System.out) and the value of the parameter (i) are first pushed onto the operand stack before the method println is invoked. The detailed information on the instructions can be obtained from the JVM specification, but that is beyond the scope of this book. It is useful to familiarize yourself with the instructions and their operands, even though we are going to use a framework that provides a layer of abstraction for the bytecode. The instrumentation and generation of bytecode require constructing instruction sets programmatically, so at least a basic understanding of the instruction set and how it maps to Java is essential.

Class File Format

The format of the binary class file is mandated by the JVM specification. It is described by a series of data structures that represent the class itself, its methods, its fields, and its attributes. To manipulate the bytecode, you need to learn about the naming conventions used for various elements and the format of the key data structures.

Field and Method Descriptors

Java supports overloaded methods by coupling the method with the descriptor, created based on the parameters the method takes. That way, internally print(int i) and print(char ch) are stored as two separate methods. Name mangling follows a convention mandated by the JVM specification, and because the bytecode stores the mangled names, you can get a glimpse of it here.

The fields and method descriptors are encoded based on their types. Table 17.1 shows the Java declared type and the corresponding field descriptor type that is used in the bytecode.

Table 17.1. Field Type Codes

DECLARED TYPE	DESCRIPTOR TYPE
`byte`	`B`
`char`	`C`
`double`	`D`
`float`	`F`
`int`	`I`
`long`	`J`
`short`	`S`
`boolean`	`Z`
`Classname` instace	`L<Classname>;`
`[]` (one dimension of array)	`[`

Table 17.2 shows some examples of Java declarations and their descriptors in the bytecode.

Table 17.2. Examples of Descriptor Types

TYPE DECLARATION	DESCRIPTOR TYPE
`int number;`	`I`
`int[][] numbers;`	`[[I`
`Object reference;`	`Ljava.lang.Object;`

Method descriptors are created using the following format:

([<param1>[...<paramN>]])<return>

where

<param1> ... <paramN> are optional parameter type descriptors.
<return> is the return type descriptor, or V if the method is void.

For example, a method that is declared as

Integer getIntProperty(String propertyName, int defaultValue)

would have the method descriptor

(Ljava.lang.String;I)Ljava.lang.Integer;

Certain special methods have predefined names. Static initializers are named <clinit>, and instance initializers and constructors are named <init>.

Class File Structure

Each Java class is defined by a binary stream, typically stored in a class file, consisting of 8-bit bytes. The stream content is described by a pseudo structure given in the JVM specification and quoted here in Listing 17.3. Although this might look like too much information, the structures presented in this and the following sections will help in understanding the generation and instrumentation of bytecode later.

Example 17.3. ClassFile Structure

ClassFile {
    u4 magic;
    u2 minor_version;
    u2 major_version;
    u2 constant_pool_count;
    cp_info constant_pool[constant_pool_count-1];
    u2 access_flags;
    u2 this_class;
    u2 super_class;
    u2 interfaces_count;
    u2 interfaces[interfaces_count];
    u2 fields_count;
    field_info fields[fields_count];
    u2 methods_count;
    method_info methods[methods_count];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

For clarity, the JVM specification defines pseudo-types u1, u2, and u4 representing unsigned 1-, 2-, and 4-byte types, respectively. Table 17.3 lists each field of the ClassFile structure and its meaning.

Table 17.3 . ClassFile Fields

FIELD	DESCRIPTION
`Magic`	Class file format marker. It always has the value of `0xCAFEBABE`.
`Minor_version, major_version`	Version of JVM for which the class file was compiled. JVMs might support lower major versions but do not run higher major versions.
`constant_pool_count`	Number of items in the constant pool array. The first item of the constant pool is reserved for internal JVM use, so the valid values of `constant_pool_count` are `1` and higher.
`constant_pool[]`	An array of variable-length structures representing string constants, class and field names, and other constants.
`access_flags`	A mask of modifiers used in class or interface declarations. The valid modifiers are `ACC_PUBLIC`, `ACC_FINAL`, `ACC_SUPER`, `ACC_INTERFACE`, and `ACC_ABSTRACT`.
`this_class`	An index of the `constant_pool` array item that describes this class.
`super_class`	A zero or an index of the `constant_pool` array item describing the super class for this class. For a class, a value of `0` indicates that the super class is `java.lang.Object`.
`interfaces_count`	Number of super interfaces of this class or interface.
`interfaces[]`	An array of indexes of `constant_pool` items describing the super interfaces of this class.
`fields_count`	Number of items in the `fields` array.
`fields[]`	An array of variable-length structures describing the fields declared in this class.
`Methods_count`	Number of items in the `methods` array.
`Methods[]`	An array of variable-length structures describing the methods declared in this class, including the method bytecode.
`attributes_count`	Number of items in the `attributes` array.
`Attributes[]`	An array of variable-length structures declaring attributes of this class file. The standard attributes include `SourceFile`, `LineNumberTable`, and others. The JVM is required to ignore the attributes that are not known to it.

The constant pool deserves a little more attention because it is used frequently by other structures. Any text string found in a Java class, regardless of its nature, is stored in the same pool of constants. This includes the class name, names of fields and methods, names of classes and methods the class invokes, and literal strings used inside the Java code. Anytime a name or string needs to be used, it is referred to by an index into the constant pool. The constant pool is an array of cp_info structures, the general format of which is shown in Listing 17.4.

Example 17.4. Constant Pool Item Structure

cp_info {
    u1 tag;
    u1 info[];
}

The actual items stored in the pool follow the structure that corresponds to the tag. For example, a string is defined using a CONSTANT_String structure and a reference to a field using CONSTANT_Fieldref. The list of structures and their contents can be found in the JVM specification.

The ClassFile structure uses three other structures: field_info, method_info, and attribute_info. field_info is similar to method_info, so we'll show only the method_info structure in Listing 17.5.

Example 17.5. method_info Structure

method_info {
    u2 access_flags;
    u2 name_index;
    u2 descriptor_index;
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

The meanings of the method_info fields are given in Table 17.4.

Table 17.4. method_info Fields

FIELD	DESCRIPTION
`access_flags`	A mask of modifiers describing the method accessibility and properties, including `static`, `final`, `synchronized`, `native`, and `abstract`.
`name_index`	An index into the `constant_pool` array item representing the method name.
`descriptor_index`	An index into the `constant_pool` array item representing the method descriptor.
`attributes_count`	The number of items in the `attributes` array.
`attributes[]`	An array of method attributes. The attributes defined by the JVM specification include `Code` and `Exceptions`. The attributes not recognized by the JVM are ignored.

Attributes

The attributes are used in the ClassFile, field_info, method_info, and Code_attribute structures to provide additional information that depends on the structure type. For example, class attributes include the source filename and debugging information, whereas method attributes include the bytecode and exceptions. Listing 17.6 shows the structure of attribute_info, and Table 17.5 lists its fields.

Table 17.5. attribute_info Fields

FIELD	DESCRIPTION
`attribute_name_index`	An index of the `constant_pool` item representing the attribute name
`attribute_length`	The length of the `attribute_info` array in bytes
`attribute_info`	The binary content of the attribute

Example 17.6. attribute_info Structure

attribute_info {
    u2 attribute_name_index;
    u4 attribute_length;
    u1 info[attribute_length];
}

The compilers and post processors are allowed to define and name new attributes, as long as they do not affect the semantics of the class. For instance, AOP implementations can use bytecode attributes to store the aspects defined for a class.

Bytecode Verification

When a compiler compiles Java source into bytecode, it performs extensive checks on syntax, keyword, operator usage, and other possible errors. This ensures that the generated bytecode is valid and safe to run. As the class is loaded into a JVM, a simplified subset of verifications is performed to ensure that the class file has the correct format and has not been tampered with. For instance, the bytecode verifier checks that the first 4 bytes contain the magic number and the attributes are of the proper length. It checks that the final classes are not subclassed and that the fields and methods have correct references into the constants pool; it also performs a number of other checks.

Instrumenting and Generating Bytecode

We have reached the point where you can finally get your hands on the keyboard and do some nifty stuff. Now that you know enough about the bytecode, you can implement the two most common methods of bytecode manipulation. Obviously, working directly with the binary content of the class file is a tedious task. To make our job easier, we will use an open source library from Apache called the Byte Code Engineering Library (BCEL).

BCEL Overview

The home page for BCEL is located at http://jakarta.apache.org/bcel, where you can download the binary distribution, source code, and manual. The library provides an object-oriented API to work with the structures and fields that compose a class. It can be used to read an existing class file and represent it with a hierarchy of objects; transform the class representation by adding fields, methods, and binary code; and programmatically generate new classes from scratch. The class representation can be saved to a file or passed to the JVM as an array of bytes to support instrumentation and generation on-the-fly. BCEL even comes with a class loader that can be used to dynamically instrument or create classes at runtime. The class diagram of BCEL's main classes is shown in Figure 17.2.

Figure 17.2. Class diagram of BCEL's main classes.

Table 17.6 provides brief descriptions of the main classes we will use. The detailed information is available from BCEL JavaDoc.

Table 17.6. Main BCEL Classes

BCEL CLASS	DESCRIPTION
`JavaClass`	Represents an existing Java class. It contains fields, methods, attributes, the constant pool, and other class data structures.
`Field`	Represents the `field_info` structure.
`Method`	Represents the `method_info` structure.
`ConstantPool`	Represents a pool of constants contained in the class.
`ClassGen`	Dynamically creates a new class. It can be initialized with an existing class.
`FieldGen`	Dynamically creates a new field. It can be initialized with an existing field.
`MethodGen`	Dynamically creates a new method. It can be initialized with an existing method.
`ConstantPoolGen`	Dynamically creates a new pool of constants. It can be initialized with an existing constant pool.
`InstructionFactory`	Creates instructions to be inserted into bytecode.
`InstructionList`	Stores a list of bytecode instructions.
`Instruction`	Represents an instruction, such as `iconst_0` or `invokevirtual`.

As you can see, most of the classes are a direct mapping to the terms and data structures defined in the JVM specification.

Instrumenting Methods

Instrumenting is inserting new bytecode or augmenting the existing bytecode of a class. Products that produce runtime performance metrics of executing Java applications rely on instrumentation to collect the data. To get some practical experience, let's develop a framework that produces a log of method invocations at runtime. Omniscient Debugger, covered in Chapter 9, “Cracking Code with Unorthodox Debuggers,” uses a similar technique to record the program execution so it can be viewed later. Recording the method invocations at runtime provides the benefit of having a detailed log of the code, executed by the JVM.

To test the implementation, we'll use a class called SimpleClass defined in package covertjava.bytecode, with a main method that is shown in Listing 17.7.

Example 17.7. SimpleClass's main() Method

public static void main(String[] args) {
    int i = 0;
    i = i + 1;
    System.out.println(i);
}

To keep the example simple, we are not going to write the entire invocation logging framework. Instead, we'll limit the implementation to the InvocationRegistry class with a static method, as shown in Listing 17.8.

Example 17.8. Entry Point into the Method Logging Framework

public static void methodInvoked(String methodName) {
    System.out.println("*** method invoked " + methodName);
}

methodInvoked()is the entry point into the method logging framework, and it is used to log a method invocation. For each thread, it can store a call stack of methods, which can be saved or printed at the end of the application run. For now, the implementation just prints the method name to indicate that the framework was called for that method.

With the foundation laid, we can embark on implementing the class that will do the method bytecode instrumentation. We'll call it MethodInstrumentor and have its main() method take in the name of the class and the methods we want to instrument from the command line. When executed, MethodInstrumentor will load the given class, instrument the methods whose names match the given regular expression pattern by adding a call to InvocationRegistry.methodInvoked(), and then save the class under a new name. Running the new version of the class should log its method invocations in the Registry. MethodInstrumentor is located in the covertjava.bytecode package, and we are going to use a top-down approach to develop it. The main() method of MethodInstumentor is shown in Listing 17.9.

Example 17.9. MethodInstrumentor's main() Method

public static void main(String[] args) throws IOException {
    if (args.length != 2) {
        System.out.println("Syntax: MethodInstrumentor " +
                           "<full class name> <method name pattern>");
        System.exit(1);
    }
    JavaClass cls = Repository.lookupClass(args[0]);
    MethodInstrumentor instrumentor = new MethodInstrumentor();
    instrumentor.instrumentWithInvocationRegistry(cls, args[1]);
    cls.dump("new_" + cls.getClassName() + ".class");
}

After checking the command-line syntax, the MethodInstrumentor attempts to load the given class using BCEL's Repository class. The Repository uses the application class path to locate and load the class, which is just one of many alternatives to loading a class with BCEL. For some inexplicable reason, BCEL returns null on error conditions instead of throwing an exception, but for the sake of code clarity we won't check for it. After the class is loaded, an instance of MethodInstrumentor is created and its instrumentWithInvocationRegistry() method is called to perform the transformations. When finished, the class is saved to a file with a new name. Let's look at the implementation of instrumentWithInvocationRegistry shown in Listing 17.10.

Example 17.10. instrumentWithInvocationRegistry Implementation

public void instrumentWithInvocationRegistry(JavaClass cls,
                                             String methodPattern) {
    ConstantPoolGen constants = new ConstantPoolGen(cls.getConstantPool());
    Method[] methods = cls.getMethods();

    for (int i = 0; i < methods.length; i++) {
        // Instrument all methods that match the given criteria
        if (Pattern.matches(methodPattern, methods[i].getName())) {
            methods[i] = instrumentMethod(cls, constants, methods[i]);
        }
    }
    cls.setMethods(methods);
    cls.setConstantPool(constants.getFinalConstantPool());
}

Because we are going to be adding invocation of a method from a different class, we must refer to it by name. Recall that all names are stored in the constants pool, which means we'll have to add new constants to the existing pool. To add new elements to structures in BCEL, we must rely on the generator classes, which have a suffix Gen in their names. The code creates an instance of ConstantPoolGen that is initially populated with constants from the existing pool; then it iterates all the methods, harnessing the power of regular expressions to test which methods must be instrumented. When all the methods are processed, the class is updated with the new methods and the new pool of constants. The actual job of instrumenting is done in instrumentMethod(), as shown in Listing 17.11.

Example 17.11. instrumentMethod() Implementation

public Method instrumentMethod(JavaClass cls, ConstantPoolGen constants,
                               Method oldMethod) {
    System.out.println("Instrumenting method " + oldMethod.getName());
    MethodGen method = new MethodGen(oldMethod, cls.getClassName(), constants);
    InstructionFactory factory = new InstructionFactory(constants);
    InstructionList instructions = new InstructionList();

    // Append two instructions representing a method call
    instructions.append(new PUSH(constants, method.getName()));
    Instruction invoke = factory.createInvoke(
            "covertjava.bytecode.InvocationRegistry",
            "methodInvoked",
            Type.VOID,
            new Type[] {new ObjectType("java.lang.String")},
            Constants.INVOKESTATIC
            );
    instructions.append(invoke);

    method.getInstructionList().insert(instructions);
    instructions.dispose();
    return method.getMethod();
}

As you can see, instrumentMethod() programmatically creates bytecode instructions that correspond to a method call. The easiest way to select the correct JVM instructions and their parameters is to write the code in Java first, compile it, and then use something like the jClassLib viewer to see how it is translated to the bytecode. Then the corresponding bytecode can be constructed using BCEL objects.

The first thing instrumentMethod() does is instantiate a MethodGen object that is used to store the new bytecode. Then a factory to create and a list in which to store the instructions are created. If you have paid attention to this chapter and played with the jClassLib Bytecode Viewer, you might recall that a Java method call is represented by several bytecode instructions. First, the method parameters must be pushed onto the operands stack, and then the invokevirtual instruction is issued to transfer the control to the method (refer to Listing 17.2 for an example of method call bytecode). This is precisely what we have to insert into the method code before its existing bytecode. If we were working with the bytecode directly, we'd have to insert two constants into the constants pool: covertjava.bytecode.InvocationRegistry for the class name and methodInvoked for the method name. Luckily, BCEL does this for us because we are using the high-level classes such as InstructionFactory and PUSH, which automatically add constants to the pool. After the instructions are created, they are appended to the instruction list. When the code generation part is finished, the list is inserted into the generated method instructions and the method structure is returned.

To test that the instrumentation works, compile the classes and run MethodInstrumentor on SimpleClass.class using the following command line:

java covertjava.bytecode.MethodInstrumentor covertjava.bytecode.SimpleClass .*

A new class file called new_covertjava.bytecode.SimpleClass.class should be created in the current directory. Copy this class to the classes directory, overriding the existing SimpleClass.class file; then run the SimpleClass main() method. If all works well, you should see the following on the console:

C:ProjectsCovertJavaclasses>java covertjava.bytecode.SimpleClass
*** method invoked main
1

As you can see, the instrumented class starts by calling InvocationRegistry, which outputs the first line; then it executes its own body, which outputs 1.

Generating Classes

Our second task is to learn how to generate a new class programmatically. As was mentioned earlier, this comes in handy for middleware products and frameworks that want to avoid source code generation. In our example, we'll create a generator of a value object that contains all the fields of the given class but no methods. The value object is a common design pattern used in distributed applications to pass data across the network. Admittedly, our generator will produce a very crude version of the value objects, but we'll make it a little interesting by ensuring that it generates only the fields whose values are meant to be retained.

Once again, we will use SimpleClass as a guinea pig in our experiment. SimpleClass defines five fields, as shown in Listing 17.12.

Example 17.12. SimpleClass Fields

public int number;
protected String name;
private Thread myThread;
static String className;
transient String transientName;

We will write a ClassGenerator class in package covertjava.bytecode that takes two command-line parameters—a fully qualified class name and a regular expression pattern for field names to copy. The main() method of ClassGenerator is shown in Listing 17.13.

Example 17.13. ClassGenerator's main() Method

public static void main(String[] args) throws IOException {
    if (args.length != 2) {
        System.out.println("Syntax: ClassGenerator " +
                           "<full class name> <field name pattern>");
        System.exit(1);
    }
    JavaClass sourceClass = Repository.lookupClass(args[0]);
    ClassGenerator generator = new ClassGenerator();
    JavaClass valueClass = generator.generateValueObject(sourceClass, args[1]);
    valueClass.dump(valueClass.getClassName() + ".class");
}

Just as in MethodInstrumentor, the implementation checks the command-line syntax, loads the class, and then calls the generateValueObject() method that is shown in Listing 17.14.

Example 17.14. ClassGenerator's generateValueObject() Method

public JavaClass generateValueObject(
    JavaClass sourceClass,
    String fieldPattern)
{
    String newName = sourceClass.getClassName() + "Value";
    ClassGen classGen = new ClassGen(
            newName,
            "java.lang.Object",
            newName,
            Constants.ACC_PUBLIC | Constants.ACC_SUPER,
            new String[] { "java.io.Serializable" });
    Field[] fields = sourceClass.getFields();
    for (int i = 0; i < fields.length; i++) {
        if (Pattern.matches(fieldPattern, fields[i].getName())) {
            int skipFlags = Constants.ACC_STATIC | Constants.ACC_TRANSIENT;
            if ((fields[i].getAccessFlags() & skipFlags) == 0) {
                fields[i].setAccessFlags(Constants.ACC_PUBLIC);
                addField(classGen, fields[i]);
            }
        }
    }
    return classGen.getJavaClass();
}

The implementation first creates an instance of ClassGen to represent the class being generated. The class has the same name as the parameter class, but with a Value suffix. It extends java.lang.Object and implements java.io.Serializable. Next, the implementation iterates the fields of the parameter class looking for names that match the given criteria. Using a bitmask, the implementation filters out the static and transient fields and copies the qualifying fields to the class being generated. The access modifier of the generated field is set as public for simplicity. After the generation is complete, the class representation is returned to the caller, which persists it to disk.

Running ClassGenerator on SimpleClass produces a file called covertjava.bytecode.SimpleClassValue.class in the current directory. Listing 17.15 shows the decompiled version of the class.

Example 17.15. Decompiled Version of the SimpleClassValue Class

package covertjava.bytecode;
import java.io.Serializable;

public class SimpleClassValue
    implements Serializable
{
    public int number;
    public String name;
    public Thread myThread;
}

Voilá! All the appropriate fields of SimpleClass have been generated for SimpleClassValue.

ASM Library

A new open source project that is gaining momentum is the ASM bytecode manipulation library, hosted at http://asm.objectweb.org/. It is designed to achieve the same goals as the BCEL library but claims a significantly better performance because of a different implementation approach. BCEL creates a complete object tree representing a binary class file, down to the individual bytecode instructions. Therefore, it can potentially have hundreds of objects created for one class file, which can lead to performance degradation. Although having an object for every class file attribute is convenient, this approach can become costly for runtime bytecode manipulation if thousands of classes are instrumented.

ASM uses a visitor design pattern to avoid instantiating objects when not required. A class analyzer provided by the framework invokes a user-defined visitor class passing method and field data as parameters. For most of the parameters, the visitor implementation simply passes them to the next visitor, keeping the data in binary form. For those fields or methods that need to be changed, the visitor implementation obtains object representation from the framework and then manipulates the object. This way, most of the bytecode remains in binary form and the performance overhead is minimal.

If having minimal performance overhead of instrumentation is important, ASM is a better choice than BCEL. If clarity and simplicity of implementation are of a higher priority, I recommend BCEL.

Bytecode Tweaking Compared with AOP and Dynamic Proxies

Now that you have learned how to tweak the bytecode, you can compare this technique with other approaches of augmenting the functionality at runtime. Chapter 16, “Intercepting Control Flow,” presented dynamic proxies that enable intercepting methods of any interface without a static implementation of that interface. Although dynamic proxies are simple to write and easy to use, their main drawback is the fact that they work only with interfaces (not with classes) and require explicit instantiation in the calling code. Thus, to use a dynamic proxy with Chat, we had to call the setMessageListener() method of ChatServer to install the proxy. If we didn't have the source code for Chat, this wouldn't have been possible without decompiling. Changing the application code is acceptable during the development, but it is not a suitable solution for third-party code or runtime integration. Unlike the dynamic proxy, bytecode tweaking does not require any compile-time changes in the code being tweaked.

AOP, an emerging technology for adding cross-sectional properties to objects and methods, is a clean and well-structured enhancement for traditional programming. Using aspects, you can easily add functionality such as the tracing of method calls or preprocessing and post-processing. AOP cleanly separates the implementation of the program logic from the infrastructure tasks, such as tracing, profiling, security, and others. The aspects are defined in separate files that are compiled and processed together with the application code. Implementations of AOP rely on bytecode instrumentation to insert the additional behavior. In that, they are more similar to the bytecode tweaking we've looked at in this chapter than to the dynamic proxies. AOP is a high-level approach that lacks the flexibility offered by direct bytecode engineering. When appropriate, aspects can be the easiest way of adding the covert logic to an existing application.

Quick Quiz

1:	What are the reasons to manipulate the bytecode?
2:	What is opcode, and how are the operands passed to a bytecode instruction?
3:	What would a method descriptor look like for a Java method named `getCount()` declared as `public Object[] getObjects(String name, char type)`?
4:	What structures is class file composed of?
5:	Which main classes of BCEL are used to instrument or generate a class?
6:	Which attribute of a method needs to be altered to instrument its bytecode?

In Brief

Bytecode manipulation is useful for code generation, instrumentation of existing classes, and enhancement of the behavior of classes without altering their source code.
The format of the Java class file and the possible instructions are defined in the JVM specification.
The logic of each Java method is represented with a set of primitive JVM instructions that are basic commands bearing a close resemblance to the machine code.
The binary format of the class file is represented by pseudo structures defined in the JVM specification, which include data on the class, fields, methods, attributes, and other properties.
The Apache Byte Code Engineering Library (BCEL) provides an object-oriented API for working with the structures and fields that compose a class.
Instrumenting is inserting new bytecode or augmenting the existing bytecode of a class.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 17. Understanding and Tweaking Bytecode

Create new playlist

Sign In

Sign Up