Chapter 10

Understanding the Java Virtual Machine

The Java virtual machine (JVM) is the platform upon which your programs run. It is built for that specific operating system and architecture, and sits between the operating system and any compiled programs or applications you write, making it platform agnostic. Java programs are compiled (using javac) into bytecode. This is interpreted by the JVM into the specific instructions for that architecture and operating system.

Other compiled languages such as C++ or Objective-C need to be compiled to run with a defined host architecture and operating system, but this is not the case with Java. Regardless of the platform on which your Java code was compiled, it can run on any JVM of the same or later version.

Considering this is where your code runs, it is important to understand how the JVM works, and how you can tweak certain parameters to obtain optimum performance for your code.

This chapter covers the particulars around the JVM’s memory model, such as how that memory is allocated to your objects, and how the JVM reclaims that memory once you no longer need it. It then reviews the different memory areas and how the JVM uses them. Finally, this chapter looks at certain language hooks to allow executing code to interact with a running JVM.

Garbage Collection


How is memory allocated?


The new keyword allocates memory on the Java heap. The heap is the main pool of memory, accessible to the whole of the application. If there is not enough memory available to allocate for that object, the JVM attempts to reclaim some memory from the heap with a garbage collection. If it still cannot obtain enough memory, an OutOfMemoryError is thrown, and the JVM exits.

The heap is split into several different sections, called generations. As objects survive more garbage collections, they are promoted into different generations. The older generations are not garbage collected as often. Because these objects have already proven to be longer lived, they are less likely to be garbage collected.

When objects are first constructed, they are allocated in the Eden Space. If they survive a garbage collection, they are promoted to Survivor Space, and should they live long enough there, they are allocated to the Tenured Generation. This generation is garbage collected much less frequently.

There is also a fourth generation, called the Permanent Generation, or PermGen. The objects that reside here are not eligible to be garbage collected, and usually contain an immutable state necessary for the JVM to run, such as class definitions and the String constant pool. Note that the PermGen space is planned to be removed from Java 8, and will be replaced with a new space called Metaspace, which will be held in native memory.


Using the PermGen Space
For most applications, the PermGen area contains traditional class definitions, String constants, and not much else. Newer languages running on the JVM, such as Groovy, have the capability to create dynamic class definitions, and when used under load, this can fill up the PermGen space easily. You must be careful when creating many dynamic class definitions; and you may need to tweak the default memory allocation for PermGen space.


What is garbage collection?


Garbage collection is the mechanism of reclaiming previously allocated memory, so that it can be reused by future memory allocations. In most languages, garbage collection is automated; you do not need to free up the memory yourself. In Java, whenever a new object is constructed, usually by the new keyword, the JVM allocates an appropriate amount of memory for that object and the data it holds.

When that object is no longer needed, the JVM needs to reclaim that memory, so that other constructed objects can use it.

With languages such as C and C++, it is necessary to manage these memory allocations manually, usually through function calls to malloc and free. More modern languages, such as Java and C#, have an automatic system for this, taking the effort, and any potential mistakes, away from the programmer.

Several different algorithms for garbage collection exist, but they all have the same goal of finding allocated memory that is no longer referenced by any live code, and returning that to a pool of available memory for future allocations.

The traditional garbage collection algorithm in Java is called mark-and-sweep. Each object reference in running code is marked as live, and each reference within that object is traversed and also marked as live, and so on, until all routes from live objects have been traced.

Once this is complete, each object in the heap is visited, and those memory locations not marked as live are made available for allocation. During this process, all of the threadsin the JVM are paused to allow the memory to be reclaimed, known as stop-the-world. Naturally, the garbage collector tries to minimize the amount of time this takes. There have been several iterations of the garbage collection algorithm since Java was first released, with as much of the work done in parallel as possible.

Java 6 introduced a new algorithm, called Garbage First (G1). It was approved for test use in Java 6, and production use in Java 7. G1 still concentrates on a mark-and-sweep algorithm, running in parallel, but it concentrates on areas of mainly empty memory first in an attempt to keep large areas of free space available.

Other operations are also performed during garbage collection, such as promotion to different generations, and grouping frequently accessed objects together by moving the objects around within memory to try to retain as much free space as possible. This is called compaction. Compaction takes place while the JVM is in its stop-the-world phase, as live objects are potentially moving to different physical memory locations.

Memory Tuning


What is the difference between the stack and the heap?


Memory is split into two major parts, the stack and the heap. Most discussion so far in this chapter has been about object allocation, which is what the heap is used for.

The stack is the place where any primitive values, references to objects, and methods are stored. The lifetime of variables on the stack is governed by the scope of the code. The scope is usually defined by an area of code in curly brackets, such as a method call, or a for or while loop. Once the execution has left that scope, those variables declared in the scope are removed from the stack.

When you call a method, those declared variables are placed on top of the stack. Calling another method within that stack pushes the new method’s variables onto the stack.

A recursive method is a method that, directly or indirectly, calls itself again. If a method calls itself too many times, the stack memory fills up, and eventually any more method calls will not be able to allocate their necessary variables. This results in a StackOverflowError. The stack trace from the resultant exception usually has dozens of calls to the same method. If you are writing a recursive method, it is important that you have a state within the method known as the base case, where no more recursive calls will be made.

Recursive methods usually use much more stack space, and therefore more memory, than an iterative counterpart. Although a recursive method can look neat and elegant, be aware of possible out-of-memory errors due to stack overflow. Listing 10-1 shows the same algorithm written in both a recursive and an iterative style.

Listing 10-1: Using the stack for loops

@Test
public void listReversals() {
    final List<Integer> givenList = Arrays.asList(1, 2, 3, 4, 5);
    final List<Integer> expectedList = Arrays.asList(5, 4, 3, 2, 1);

    assertEquals(expectedList.size(), reverseRecursive(givenList).size());
        assertEquals(expectedList.size(), reverseIterative(givenList).size());
}

private List<Integer> reverseRecursive(List<Integer> list) {
    if (list.size() <= 1) { return list; }
    else {
        List<Integer> reversed = new ArrayList<>();
        reversed.add(list.get(list.size() - 1));
        reversed.addAll(reverseRecursive(list.subList(0, list.size() - 1)));
        return reversed;
    }
}

private List<Integer> reverseIterative(final List<Integer> list) {
    for (int i = 0; i < list.size() / 2; i++) {
        final int tmp = list.get(i);
        list.set(i, list.get(list.size() - i - 1));
        list.set(list.size() - i - 1, tmp);
    }

    return list;
}

Compare the two algorithms for reversing an array. How much space does each algorithm need? For the recursive definition, each time the method is recursively called, a new list is created. These lists from each method call must be held in memory until the list has been completely reversed. Although the actual lists will be held on the heap (because that is where objects are stored), each method call needs stack space.

For the iterative version, the only space needed is a variable to hold one value while it is being swapped with its counterpart at the other end of the list. There are no recursive calls, so the stack will not grow very deep. No new objects are allocated, so no extra space is taken on the heap.

Experiment with the number of elements in the list. Can you make the recursive method throw a StackOverflowError?


How can you define the size of the heap for the JVM?


The JVM provides several command-line arguments for defining the size of the memory allocated to the different memory areas.

When starting up the JVM, you can specify the maximum heap size with the command-line flag -Xmx and a size. For example, starting the JVM with java -Xmx512M <classname> creates a JVM with a maximum heap size of 512 megabytes. Suffixes for the memory size are G for gigabytes, M for megabytes, or K for kilobytes. It is important to note that the JVM will not allocate this memory in its entirety on startup; it will grow to a maximum of that size only if needed. Before the JVM expands its memory allocation, it will try to perform as much garbage collection as possible.

To specify the initial amount of memory allocated to the JVM, use the -Xms argument. It works in the same way as -Xmx. This argument is advisable if you know you are going to need a certain amount of memory for your function, as it will save your application from excessive slow garbage collections before expanding to your required size.

If both arguments are set to the same value, the JVM will ask the operating system for that full memory allocation on startup, and it will not grow any larger.

For the initial memory allocation, the default value is 1/64 of the memory on the computer, up to 1 GB. For the maximum default, it is the smaller of 1 GB and a quarter of the computer’s physical memory. Considering that these can vary wildly from computer to computer, you should specify the values explicitly for any code running in a production environment, and make sure your code performs to satisfaction for these values.

Similar to setting initial and maximum heap sizes, you have JVM startup arguments for setting the size of the stack, too. For most running programs, you should avoid setting these. If you find yourself running into StackOverflowExceptions on a regular basis, you should examine your code and replace as many recursive methods with iterative counterparts as possible.

Other relevant JVM arguments include -XX:Permsize and -XX:MaxPermSize for the permanent generation. You may want to set this if you have a large number of classes or string constants, or if you are creating many dynamic class definitions using a non-Java language.


Is it possible to have memory leaks in Java?


A common misconception with the Java language is that, because the language has garbage collection, memory leaks are simply not possible. Admittedly, if you are coming from a C or C++ background you will be relieved that you do not need to explicitly free any allocated memory, but excessive memory usage can occur if you are not careful. Listing 10-2 is a simple implementation of a stack that can be exposed to memory leaks.

Listing 10-2: A collection with a memory leak

public class MemoryLeakStack<E> {

    private final List<E> stackValues;
    private int stackPointer;

    public MemoryLeakStack() {
        this.stackValues = new ArrayList<>();
        stackPointer = 0;
    }

    public void push(E element) {
        stackValues.add(stackPointer, element);
        stackPointer++;
    }

    public E pop() {
        stackPointer--;
        return stackValues.get(stackPointer);
    }
}

Ignoring any concurrency issues or elementary exceptions, such as calling pop on an empty stack, did you manage to spot the memory leak? When popping, the stackValues instance keeps a reference to the popped object, so it cannot be garbage collected. Of course, the object will be overwritten on the next call to push, so the previously popped object will never be seen again; but until that happens, it cannot be garbage collected.

If you are not sure why this is a problem, imagine that each element on this stack is an object holding the contents of a large file in memory; that memory cannot be used for anything else while it is referenced in the stack. Or imagine that the push method is called several million times, followed by the same number of pops.

A better implementation for the pop method would be to call the remove method on the stackValues list; remove still returns the object in the list, and also takes it out of the list completely, meaning that when any client code has removed all references to the popped object, it will be eligible for garbage collection.

Interoperability between the JVM and the Java Language

This section covers special methods and classes.


What is the lifecycle from writing a piece of Java code to it actually running on the JVM?


When you wish for some of your Java code to be run on a JVM, the first thing you do is compile it. The compiler has several roles, such as making sure you have written a legal program and making sure you have used valid types. It outputs bytecode, in a .class file. This is a binary format similar to executable machine instructions for a specific architecture and operating system, but this is for the JVM.

The operation of bringing the bytecode for a class definition into the memory of a running JVM is called classloading. The JVM has classloaders, which are able to take binary .class files and load them into memory. The classloader is an abstraction; it is possible to load class files from disk, from a network interface, or even from an archived file, such as a JAR. Classes are only loaded on demand, when the running JVM needs the class definition.

It is possible to create your own classloader, and start an application that uses your loader to find classes from some location. Naturally, there are some security implications with this, so that an arbitrary application cannot be started with a malicious classloader. For instance, Java applets are not allowed to use custom classloaders at all.

Once a class has been loaded, the JVM itself verifies the bytecode to make sure it is valid. Some of the checks include making sure the bytecode does not branch to a memory location outside of its own class bytes, and all of the code instructions are complete instructions.

Once the code has been verified, the JVM can interpret the bytecode into the relevant instruction code for the architecture and operating system it is running on. However, this can be slow, and some of the early JVMs were notoriously slow, especially when working with GUIs for this exact reason. However, the Just In Time compiler (JIT) dynamically translates the running bytecode into native instructions so that interpreting the bytecode is not necessary. As this is performed dynamically, it is possible for the JIT to create highly optimized machine code based directly on what the state of the application is while it is running.


Can you explicitly tell the JVM to perform a garbage collection?


The static gc method on the System class is the method to call to tell the JVM to run a garbage collection, but calling this does not guarantee that a garbage collection will happen. The documentation for the gc method says:

Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.

It is not possible to enforce a garbage collection, but when you call gc, the JVM will take your request into account, and will perform a garbage collection if it can. Calling this method may make your code slower, putting your whole application in a stop-the-world state during garbage collection.

Explicit calls to System.gc are usually another form of code smell; if you are calling this in an effort to free memory, it is more than likely you have a memory leak, and you should try to address that, rather than attempting to sprinkle your code with explicit requests to the garbage collector.


What does the finalize method do?


The finalize method is a protected method, inherited from Object. When the JVM is about to garbage collect the object, this finalize method is called first. The rationale for this method is to tie off any loose ends, to close off any resources that the garbage-collected object was dependent upon.

One overriding concern with this method is that you have no control over when it is called: It is called if, and only if, the JVM decides to garbage collect the object. If you were to use this method for closing a database connection or a file handle, you could not be sure when exactly this event would happen. If you have many objects configured in this manner, you run the risk of exhausting a database connection pool, or perhaps even having too many files open at once.

If you are writing code that depends on an external resource that needs to be closed explicitly, like a database, filesystem, or network interface, you should aim to close your resources as soon as possible. When appropriate, use Java 7’s new try-with-resources construct; and keep resources open for as little time as possible.


What is a WeakReference?


A WeakReference is a generic container class, and when the contained instance has no strong references, it is eligible for garbage collection.

An alternative approach to the stack implementation in Listing 10-1 would have been to hold a list of WeakReferences of the elements. Then, upon garbage collection, any elements that have no other references would be set to null. Listing 10-3 shows a possible declaration with an additional method, peek.

Listing 10-3: A stack implementation with WeakReference objects

public class WeakReferenceStack<E> {

    private final List<WeakReference<E>> stackReferences;
    private int stackPointer = 0;

    public WeakReferenceStack() {
        this.stackReferences = new ArrayList<>();
    }

    public void push(E element) {
        this.stackReferences.add(
                stackPointer, new WeakReference<>(element));
        stackPointer++;
    }

    public E pop() {
        stackPointer--;
        return this.stackReferences.get(stackPointer).get();
    }

    public E peek() {
        return this.stackReferences.get(stackPointer-1).get();
    }
}

When the new element is pushed in the stack, it is stored as a WeakReference, and when it is popped, the WeakReference is retrieved, and get is called to get that object. Now, when any client code has no more pointers to that object, it will be eligible for removal in the next garbage collection.

The peek method simply returns the top element on the stack, without removing it.

Listing 10-4 shows that the reference in the stack is set to null when all strong references to the value are removed. The ValueContainer class used merely holds a string, but its finalize method has been overridden to highlight this method being called by the garbage collector. If you remove the System.gc() line, the test will fail.

Listing 10-4: Using a stack with WeakReferences

@Test
public void weakReferenceStackManipulation() {
    final WeakReferenceStack<ValueContainer> stack = new WeakReferenceStack<>();

    final ValueContainer expected = new ValueContainer("Value for the stack");
    stack.push(new ValueContainer("Value for the stack"));

    ValueContainer peekedValue = stack.peek();
    assertEquals(expected, peekedValue);
    assertEquals(expected, stack.peek());
    peekedValue = null;
    System.gc();
    assertNull(stack.peek());
}

public class ValueContainer {
    private final String value;

    public ValueContainer(final String value) {
        this.value = value;
    }

    @Override
    protected void finalize() throws Throwable {
        super.finalize();
        System.out.printf("Finalizing for [%s]%n", toString());
    }

    /* equals, hashCode and toString omitted */
}

This test demonstrates some quite complex concepts. It is important to note that the reference expected and the reference passed to the stack is different. Had the push to the stack looked like stack.push(expected), a strong reference would have been kept at all times during this test, meaning it would not have been garbage collected, causing the test to fail.

The test inspects the stack with the peek method, confirming that the value is on the stack as expected. The peekedValue reference is then set to null. There are no references to this value, other than inside the WeakReference in the stack, so at the next garbage collection, the memory should be reclaimed.

After instructing the JVM to perform a garbage collection, that reference is no longer available in the stack. You should also see a line printed to the standard out saying that the finalize method has been called.


Using This Example Stack
If you were to need this kind of functionality from a stack, calling the pop method should decrement the stack pointer until it finds a non-null value. Any stack optimizations due to WeakReferences being garbage collected have been omitted for brevity.


What is a native method?


Regular Java class definitions are compiled to bytecode, held in class files. This bytecode is platform independent, and is translated into specific instructions for the architecture and operating system running the bytecode at run time.

At times, however, you need to run some platform-specific code, perhaps referencing a platform-specific library, or making some operating system–level calls, such as writing to disk or a network interface. Fortunately, the most common cases have been implemented for each platform on which the JVM is available.

For those other times, it is possible to write a native method; that is, a method with a well-defined header in C or C++, identifying the class name, the Java method name, as well as its parameters and return type. When your code is loaded into the JVM, you need to register your native code so that it knows exactly what needs to be run when your native method is called.


What are shutdown hooks?


When the JVM terminates, it is possible to have some code run before it exits, similar to the finalize method running before an object is garbage collected. Shutdown hooks are references to Thread objects; you can add a new reference by calling the addShutdownHook method on the current Runtime instance. Listing 10-5 shows a simple example.

Listing 10-5: Adding a shutdown hook

@Test
public void addShudownHook() {
    Runtime.getRuntime().addShutdownHook(new Thread() {
        @Override
        public void run() {
            System.err.println(
                    "Shutting down JVM at time: " + new Date());
        }
    });
}

The shutdown hook is run on a successful or unsuccessful exit code. Although the code in Listing 10-5 merely logs what time the JVM terminated, you can imagine this facility could be used for notifying a support team of any JVM termination, particularly if that termination is not expected.

Summary

The JVM is an extremely versatile piece of software. It is used in many, many different ways, from desktop applications to online real-time trading systems.

For any non-trivial application, you will need to examine how it interacts with the JVM, and whether it runs as expected under many different conditions. Optimization of a running JVM is often more art than science, and use of the relevant command-line arguments are rarely taught. Read the documentation for the JVM, the java command-line application on a Mac or UNIX, or java.exe on Windows. Experiment with the command-line arguments; see what effect they have on your program, and try to break your application.

Be aware that different JVM vendors provide different command-line arguments. As a rule, any argument prefixed with -XX: is usually a non-standard option, so their usage or even their presence will differ from vendor to vendor.

The next chapter is on concurrency, looking at how Java and the JVM helps to make this as simple as possible. The chapter also covers a new approach to asynchronous execution, using actors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.35