C H A P T E R  13

Garbage Collection

Java relies on garbage collection. A garbage collector removes unused objects from memory and lets your programs re-use memory rather than constantly grow. For very small programs, this constant growth doesn't matter much. However, even a program of fairly low complexity and scale can quickly chew through a lot of memory. Therefore, most programs really need some kind of garbage collection mechanism.

When they use many other languages, including C++, programmers have to manage memory themselves, writing code to remove objects from memory at the right time. Java frees developers from this problem. That's not to say that Java is better than C++. Both languages have their strengths and weaknesses. Many developers who prefer Java would tell you that having to manage memory yourself is a weakness in C++ and that garbage collection is a strength of Java. Conversely, many C++ developers would tell you the opposite. Both are true: Garbage collection can be a problem in Java, yet is also an inherent feature and very powerful if managed correctly.

So how do we turn garbage collection into an advantage rather than a burden?

Understanding Memory Allocation

Before we can really talk about garbage collection, we need a basic understanding of how the Java Virtual Machine (JVM) allocates memory in the first place. Every time you create a new object or primitive, the heap grows by the size of that object or primitive. In addition, if you create a complex object (an object that itself contains other objects, as many objects do), the heap grows by the size of all those objects as well. For example, consider a class we've seen before (in the previous chapter), the TargetClickPanel class:

Listing 13-1. Memory Allocation in the TargetClickPanel Class

package com.bryantcs.examples.videogames;

import java.awt.Graphics;
import java.awt.event.MouseEvent;
import java.awt.event.MouseListener;

import javax.swing.JPanel;
public class TargetClickPanel extends JPanel implements MouseListener{

  private static final long serialVersionUID = 1L;

  private Target targets[] = new Target[5];

  public TargetClickPanel() {
    addMouseListener(this);
    for (int i = 0; i < targets.length; i++) {
      targets[i] = null;
    }
  }

  public void paint (Graphics g) {
    super.paintComponent(g);
    for (int i = 0; i < targets.length; i++) {
      if (targets[i] == null) {
        targets[i] = new Target(this);
      }
      if (!targets[i].isDone()) {
        targets[i].draw(g);
      } else {
        targets[i] = new Target(this);
      }
    }
  }

  @Override
  public void mouseClicked(MouseEvent e) {
  }

  @Override
  public void mouseEntered(MouseEvent e) {
  }
  @Override
  public void mouseExited(MouseEvent e) {
  }

  @Override
  public void mousePressed(MouseEvent e) {
  }

  @Override
  public void mouseReleased(MouseEvent e) {
    for (int i = 0; i < targets.length; i++) {

      targets[i].pointInTarget(e.getX(), e.getY());
    }
  }
}

Consider the lines in bold (the targets array and the lines within the paint method that create new instances of the Target object). If you visualize the values of all of the objects and primitives as the program runs, you can see that the paint method first creates five instances of the Target class and then keeps five of the instances going as the existing instances declare themselves to be finished. Consequently, the result of creating an instance of the TargetClickPanel also creates five instances of the Target class.

However, creating an instance of the TargetClickPanel class actually creates many more than just five instances of the Target class. The instances of the Target class that have marked themselves as finished still exist, even though the instance of the TargetClickPanel no longer has references to them. That's the key to garbage collection: references. We'll get to that shortly. First, let's consider the state of the Target objects that have been created after the TargetClick game has run for a few seconds:

Target 1: Created because targets[0] was null. Now finished.

Target 2: Created because targets[1] was null. Now finished.

Target 3: Created because targets[2] was null. Now finished.

Target 4: Created because targets[3] was null. Now finished.

Target 5: Created because targets[4] was null. Now finished.

Target 6: Created because targets[0] was finished. Now finished.

Target 7: Created because targets[1] was finished. Now finished.

Target 8: Created because targets[2] was finished. Still in play.

Target 9: Created because targets[3] was finished. Still in play.

Target 10: Created because targets[4] was finished. Still in play.

Target 11: Created because targets[0] was finished. Still in play.

Target 12: Created because targets[1] was finished. Still in play.

And so on...

At any given moment, the instance of the TargetClickPanel class has references to five instances of the Target class. But what happens to all the finished instances? As it happens, they get garbage collected. And there's the primary rule of garbage collection: If an object no longer has references held on it by other objects, that object gets garbage collected. In other words, once an object no longer has any other objects using it, that object is ready to be removed from memory. Yet another way to say this is that the object is now unreachable, meaning that no other bit of code can reach that object.

The Java Garbage Collection Algorithm: Marking and Sweeping

The algorithm that Java uses for garbage collection is called marking and sweeping. Conceptually, it's a simple idea. First, make an initial pass of the memory and mark any memory that can be collected. Then make another pass and free the memory that was marked in the first pass.

In practice, though, the process becomes much more complex. The basic algorithm (often called naïve mark and sweep) works fine, so long as the whole program stops while the garbage collector runs. The program has to stop so that the garbage collector can be sure that no new objects are being created and that no existing objects have gotten into a state such that they can be garbage collected while the collector runs. Java's earliest versions used just such an algorithm. Consequently, Java earned a reputation for being unsuitable for applications that required high performance, from video games to real-time monitoring of things like valves in a refinery. Later versions fixed that problem, but the reputation lingered for several more versions (and still pops up in some circles).

So how does one collect objects and still let the program run? In particular, how does one allow creation of new objects and send existing objects to collection while letting the garbage collector run? Two more advanced algorithms have been developed: Incremental garbage collection and concurrent garbage collection.

Incremental garbage collection relies on intermittently stopping and starting both the program and the garbage collector (which is really just another program being run in parallel by the JVM). Essentially, the program gets to do a bit of work, and then the garbage collector gets to do a bit, and so on. In this fashion, the program gets to do some work while the garbage collector runs. As you can imagine, though, the program's performance is still slower than when the garbage collector is not running. In fact,  versions of Java had noticeably slower performance when the garbage collector ran. I recall playing a video game written in Java and being able to tell when the garbage collector started and stopped.

Concurrent garbage collection lets the program run at full speed, with one exception: The program must be halted while the stack (the list of objects in memory) is scanned. Fortunately, scanning the stack takes very little time. Scanning the stack (which is really just a list) takes much less time than does scanning the heap (which is all of the objects in memory). Imagine reading a book's table of contents versus reading the whole book, and you'll have an idea of the difference.

So how does a concurrent garbage collector let the program run at (almost) full speed? The garbage collector doesn't concern itself with finding everything that can be collected. Instead, it finds what it can be sure of and frees those objects. On its next pass, it can catch anything it missed, as well as anything that has become removable in the meantime. Thus, the concurrent garbage collection rarely gets a perfect sweep, but it lets the program run much more smoothly.

Java 1.4.2 (which was a “red letter” release in the Java community, partly because of garbage collection) introduced a garbage collector that can be switched between incremental garbage collection and concurrent garbage collection. (It actually supports 4 kinds of garbage collection: two variants of naïve garbage collection, incremental garbage collection, and concurrent garbage collection.) Software developers could then decide (preferably through testing) which kind of collection worked best for their programs. Suddenly, the performance of many Java programs got a big boost. At that point, Java could finally compete with non-garbage-collecting languages (notably C++) in the arena of high performance applications. Java 5 (also known as Java 1.5) and Java 6 (or Java 1.6) continued to add fine tuning to the garbage collector, through various switches that could be set on the collector. We'll cover the most helpful of these switches near the end of this chapter.

Enter Java 7. Java 7 provides a new collector, called G1 (for “garbage first"). The G1 collector offers a step up from the previous garbage collectors by dividing the heap into regions (or sectors) and then garbage collecting each sector beginning with the sector that has the most the most garbage to collect. As a result of this “garbage-first” collection scheme, the G1 collector ensures that it gets as much garbage as possible on any given sweep. If the G1 collector was unable to sweep the whole thing because the program needed to run, at least it got to the worst sectors while it had a chance. Over subsequent sweeps, the other regions gradually fill up and come to contain the most garbage. Those sectors then get swept in their turn. Thus, the collector eventually gets to all regions in memory but doesn't try to get them all at once. It's a nice bit of optimization that is made all the better by not needing any new settings to control it. It's just how the G1 collector works. Also, there's no need to enable the G1 collector. If you're using Java 7, you're using the G1 collector, unless you specify an older collector.

Understanding Memory Settings

Some programs must keep large numbers of objects in memory to work correctly. In those cases, the only way to make them work is to expand the amount of memory available to the programs.

As a case in point, consider an XML parser. It must have a reference to each node in the document tree from the top to its current location. In addition, processing each node creates a reference to each child node. Consequently, by the time an XML parser has gotten very far into a document tree, it probably has references to a lot of nodes, each represented by an object, and each object having a reference from the object that represents its parent node. Parsers are one common example of the types of programs that don't benefit much from garbage collection. Of course, a program that parses a document and then does something else could free the memory used by the parser when the parsing is done. That memory could then be for another purpose, such that the program's footprint would be just the larger of the parsing or whatever else it does, rather than the total of both parsing and other work. Conversely, a program that runs other processes in the background must have enough memory to enable all of its concurrent processes.

So how do Java programmers give their programs more memory? They do it through switches on the JVM when they start the program. Here are the basic memory settings:

images

You can increase a program's performance by setting the starting heap size properly. If you set the number too low, the JVM starts at that amount and then has to adjust upward (perhaps several times) until it has enough memory for your program's initial set of objects. Then, as soon as the program does something (perhaps because a user did something with the program), the JVM has to allocate yet more memory. Rather than force your users to put up with the slow performance that a low starting memory value creates, you should instead allocate enough initial memory to handle all the objects your program needs at start-up time and to handle the first several operations it might need to perform.

The maximum memory setting is both more obvious and more critical. If you don't allocate enough memory, your program crashes. Usually, finding the right amount of maximum memory is a matter of experimentation. You have to start the program, let it run for a while, and see how much memory it uses. On Windows, the Task Manager shows how much memory each process, including a Java program, uses. Other operating systems reveal that information in other ways, but it should be there somewhere.

Here's an example of using the memory switches when you start a Java program:

Listing 13-2. Example of Java Memory Switches

java -Xms1024M -Xmx8192M ExampleProgram

Understanding Garbage Collection

The first thing to understand about the Java garbage collector is that you can't directly control it. You can give it hints that certain objects are now ready for collection. You can also set a number of garbage collection options when you start your program, and those settings will control the garbage collector's behavior. However, you cannot explicitly tell the garbage collector to remove an object from memory right now. That is, no method or class exists to let you specifically collect an existing object. You may be sure that the object can safely be removed, but you can't force the garbage collector to remove it.

This lack of control drives plenty of software developers a little nuts. As a group, software developers tend to want to control the computers they program, so a system that offers only indirect control is often not welcomed. For programmers accustomed to other languages (especially C++), that's often a primary complaint when they start using Java. Rather than have direct control, Java developers must instead be careful not to allow unnecessary references to exist.

When Java was new, some developers thought and said nonsense such as, “Oh, great! Java will manage my memory for me!” Therefore, more than a few programs with an unfortunate tendency to use large amounts of memory were created as a result. Having a garbage collector doesn't allow us to be lazy. We still have to manage memory, but we don't do it explicitly.

Another common complaint with Java (and other languages that use garbage collection) used to be that everything stopped while garbage collection ran. The addition of incremental and concurrent garbage collection has reduced that complaint somewhat, and further refinements in Java's garbage collection algorithms have improved the situation greatly. It used to be common for Java programs to have long pauses when the garbage collector ran. With the modern Java virtual machines available now, those pauses are now very rare (and are a symptom of poor programming rather than the JVM).

Understanding Generations

The Java garbage collector uses the concept of generations. In most applications, nearly all objects exist for a very short time. If you look at the Target objects created by an instance of the TargetClickPanel class (listed in the previous chapter), you'll see that they exist for just a few seconds each. In fact, if a player clicks on one, an instance of the Target class might exist for only a fraction of a second. Conversely, the TargetClickPanel instance and the TargetClick instance both exist for the duration of the game, but there's only instance of each class for the entire time.

Suppose a game of TargetClick involves the creation of 100 Target objects. In that scenario, two objects (one instance of the TargetClick class and one instance of the TargetClickPanel class) exist for the entire life of the program, while 100 other objects (all the instances of the Target class) exist for no more than a few seconds each. So, at least in this case, we have a great example where nearly all objects are not present in the system for long.

All of these short-lived objects end up in the “young generation.” The young generation has three spaces: “eden,” “from,” and “to.” The eden space is where objects go first. So, when your program creates a new object (such as an instance of the Target class in the TargetClick game), the object goes into the eden space. The from and to spaces are called “survivor spaces,” and objects that survive at least one garbage collection pass move to these spaces until they've survived sufficiently long to move to the tenured generation. At any given moment, one survivor space is empty. When the garbage collector runs, it moves the survivors from the eden space and the currently occupied survivor space into the currently empty survivor space.

Objects that last a while (either a certain amount of time or a certain number of collections) end up in the “tenured generation” (also called the “old generation” – the older I get, the more I prefer to talk about the “tenured generation"). Finally, the “permanent generation” isn't really a generation but is instead the area where the JVM stores the class definitions. The permanent generation isn't necessarily static, as Java programs can load and unload classes as they run.

Java's garbage collector sweeps the young generation more often than the tenured generations and uses different collection algorithms for the two generations. Further detail is beyond the scope of this book. If you're interested in greater detail about Java collection algorithms, you can find a number of books and web sites devoted to the subject. Efficient garbage collection is one of those problems that draws the best minds in the field of computer science, because it is both practical (every Java programmer needs a good one) and highly theoretical.

One of the very difficult issues facing the designers of garbage collection algorithms (including those used in Java) is how to deal with references that cross the generational boundaries. Before the garbage collector can remove an object from memory, it has to ensure that no references exist. That means checking references across the generations, which is a difficult task to do quickly.

Scavenges and Full Collections

The Java garbage collector can do partial garbage collection by collecting just the young generation. This type of operation is called a “scavenge.” When the garbage collector collects both the young and the tenured generations, this is called a “full collection,” or (often) just a “collection.” A number of command-line switches modify the garbage collector's behavior with regard to the young and tenured generations. We'll encounter these and other switches later in the chapter, in the section entitled “Understanding Garbage Collection Settings."

Garbage Collection is Event-Driven

Garbage collection doesn't happen every time an object happens to no longer have any references. If it did, every Java program would spend more time removing unused objects than it would spend doing anything else, and the performance of all Java programs would be so poor that no one would use Java for anything.

Instead, the Java garbage collector works on an event-driven model. When certain conditions are met, the garbage collector runs. These conditions involve the ratio of heap space (the amount of memory) in use to the amount of free memory and the amount of total memory. In simple terms, when the JVM determines that it's running out of memory, it runs the garbage collector. So, when the amount of heap space relative to the amount of total memory gets to a certain percentage, the garbage collector removes any objects that have no references. Doing so readjusts the percentage of memory in use downward, so that the program can keep on running.

Understanding Garbage Collection Settings

The Java garbage collector uses another group of switches (similar to the memory switches we saw earlier) that you set when you start your program. Here's a summary of the most common switches:

images

Optimizing Garbage Collection

Now that you understand the basics of how to configure memory and garbage collection for Java programs, we can talk about how to gather the information you need in order to make intelligent decisions about the various switches and settings. Again, the JVM offers a number of switches you can set to create output related to memory usage and garbage collection. The following table describes these switches.

images

Here's a bit of sample output from running the TargetClick game from the command line with the -verbose:gc, -XX:+PrintGCTimeStamps -XX:+PrintGCDetails switches:

Listing 13-3. Garbage Collection Output Sample

C: emp>java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails TargetClick
Heap
 PSYoungGen      total 17920K, used 8077K [0x00000000ec000000,
   0x00000000ed3f0000, 0x0000000100000000)
  eden space 15424K, 52% used [0x00000000ec000000,0x00000000ec7e3538,0x00000000ecf10000)
  from space 2496K, 0% used [0x00000000ed180000,0x00000000ed180000,0x00000000ed3f0000)
  to   space 2496K, 0% used [0x00000000ecf10000,0x00000000ecf10000,0x00000000ed180000)
 PSOldGen        total 40960K, used 0K [0x00000000c4000000,   0x00000000c6800000,0x00000000ec000000)
  object space 40960K, 0% used [0x00000000c4000000,0x00000000c4000000,0x00000000c6800000)
 PSPermGen       total 21248K, used 9423K [0x00000000bee00000, 0x00000000c02c0000,   0x00000000c4000000)
  object space 21248K, 44% used [0x00000000bee00000,0x00000000bf733cc0,0x00000000c02c0000)

Let's examine these results a bit. First, we can see that the young generation has 17920K allocated and is using 8077K of it. All of the objects in the young generation are in the eden space. Given how briefly instances of the Target class last, there should never be one in the from and to spaces. In the tenured generation (called PSOldGen here), we find nothing, although 40960K has been allocated to it. The instances of the TargetClick and TargetClickPanel class are not in the tenured generation because I took this snapshot shortly after starting the TargetClick game, so those objects are still in the eden space. Finally, the permanent generation contains the three class files, which use 9423K of 21248K allocated.

To really make this information useful, we'd have to let the program run for a while, find a number of these blocks of information, and compare them to one another, watching for changes over time. As you can imagine, analyzing this kind of information for a large and complex program can take quite a while. For one thing, developers working on those kinds of applications often have to let their applications run for days at a time to get a sufficiently broad window of input. Also, if the number of users or number of network connections or other load factors change (and they nearly always do – very few programs exist in constantly stable environments), those factors need to be captured and then identified in the log. Then there's the difficulty of reading this kind of information (which is why many programmers write additional utilities to “scrape” this data and put it into some handy application, such as a spread sheet). All of this together makes analyzing garbage-collection data a non-trivial task. However, the task pays off with better performance, happier users, happier programmers, and so on. Therefore, the investment in time and effort often pays off nicely.

images Tip You can get a good optimization by monitoring the size of the tenured generation over the lifetime of your program and setting its size to closely match (with perhaps a 10% increase) the tenured generation size to the largest size you observe over a few runs of your program. You can control the size of the tenured generation by controlling the size of the new generation, which you can do with the -XXNewRatio, -XX:NewSize, -XX:MaxNewSize switches. Suppose your testing reveals that your program needs 256MB of memory and that the tenured generation should be 4 times the size of the new generation (meaning you have a lot of long-lived objects). You could set the following switches: -XXNewRatio=4 -XXNewSize=52MB -XXMaxNewSize=52MB

Collection Hints

Java developers can give hints to the garbage collector. The following line asks the JVM to do garbage collection for your program:

Listing 13-4. Garbage Collection Hint

System.gc();

However, most of the Java programming community frowns on using garbage collection hints. Instead, you should try to organize your code so that the garbage collector doesn't need a hint. Most of that organization comes down to arranging your classes such that each class has a clear, well-defined job to do. For example, TargetClickPanel maintains a collection of five Target objects. When it needs a new instance of the Target class, it sets a member of that collection to the new instance. Thus, old instances become free for collection. If I had tried to do more with that collection of Target objects, I might easily have gotten into a situation where garbage collection wouldn't work correctly, as I'd still be hanging onto the Target objects.

images Note Using a garbage collection hint will not prevent your program from running out of memory if it is about to run out anyway. That's yet another reason to not use garbage collection hints. As a rule, they don't help.

Blocking Garbage Collection

Sometimes, you need an object to exist for as long as your program runs. To make sure it continues to exist, you need to ensure that a reference to that object always exists. Java offers a couple of handy ways to ensure a reference always exists: singletons and reference lists.

A singleton is a class that can never have more than one instance. Java developers have a number of techniques for creating a singleton, but Listing 13-5 shows one common way to do it.

Listing 13-5. Making a Singleton

public class Singleton {

  private static Singleton instance = new Singleton();

  private Singleton() {
    // initialization stuff goes here
  }

  public synchronized static Singleton instance() {
    return instance;
  }
}

Because there's always a static instance of the class held within the class itself, there's always a reference to this kind of singleton. We make the instance method synchronized so that, if two objects call the method at the same time, they take turns. Otherwise, they might get the object in the wrong state (in cases where some other method updates the data in the singleton class). As a rule, synchronize any method that returns a static object (not including constants).

Of course, if you need more than one instance of a class, that's not going to work. In those cases, you can use a reference list. Consider the program class shown in Listing 13-6.

Listing 13-6. Program Class with a Reference List

package com.bryantcs.examples.referenceList;

import java.util.ArrayList;

public class ReferenceProgram {

  // This list lasts as long as the program runs
  public static ArrayList<Object> referenceList;

  public static void main(String[] args) {
    referenceList = new ArrayList<Object>();
  }
}

As the comment within the class says, the referenceList object will have a reference for as long as the program runs. So far, so good, but how do we add to it? The trick to that is to have any class we need to keep around to add itself to the list, as shown in the next code listing:

Listing 13-7. Class that Adds Itself to a Reference List

package com.bryantcs.examples.referenceList;

public class ClassToList {

  public ClassToList() {
    ReferenceProgram.referenceList.add(this);
  }
}

Since ReferenceProgram.referenceList is static, it is a class variable, which means there's only one of them, regardless of the number of instances we may make of the ReferenceProgram class. (Of course, in this very simple example, we wouldn't make more than one instance of the ReferenceProgram class anyway.) As a result, we can add to that single list from anywhere within the program. Thus, we have a handy mechanism for ensuring that a reference to any given class always exists and, consequently, that any instance of this class will never be garbage collected.

While these techniques work, you should avoid overusing both the singleton pattern and the reference list pattern. If you prevent the garbage collector from removing many objects, your program performs that much more slowly.

A New Garbage Collector

As we saw earlier in the chapter, Java 7 includes a new garbage collector, called G1. G1 is a concurrent garbage collector, meaning that it uses multiple processors (if available) at the same time. Also, as we read earlier in the chapter, the G1 collector's “garbage first” algorithm offers better performance than other collectors, regardless of the number of processors. For a computer with just one processor (or a shared computer that makes only one processor available to the JVM), G1 should offer some improvement over any other garbage collector (though the nature of the application may limit the improvement). If G1 can get access to at least two processors, the improvement in performance should be even better. Given that most modern computers have multiple processors and modern server-class computers often have numerous processors, G1 may provide a sizable performance boost for some applications. That's going to result in games that play more smoothly, web applications that return web pages more quickly, and all the other good things that come with higher-performance applications.

Summary

In this chapter, we've touched on one of Java's more advanced concepts: garbage collection. We've just skimmed the surface of this advanced topic, with an eye toward providing you with the basic information for optimizing your own applications. If you really get into optimization, you can find whole books written about the subject.

In this brief overview, we covered:

  • Memory allocation
  • The garbage-collection algorithm used by Java
  • The variations on the algorithm (naïve, incremental, and concurrent)
  • How the G1 collector improves on the overall algorithm
  • The difference between a scavenge and a full collection
  • That garbage collection is event-driven
  • That garbage collection is beyond our direct control, but can be controlled indirectly
  • The most common settings that control garbage collection
  • How to see what the garbage collector does in each of its runs
  • How to use that information to optimize our garbage collection settings (and thus the performance of our programs)
  • How to prevent an object from ever being garbage collected.
  • That Java 7 offers an improved garbage collector named G1 (for “garbage first”)

For all that we've learned, I'd like to point out that you can do a lot of programming without ever thinking about garbage collection. Other than to collect sample data for this chapter, I haven't changed a single garbage collection setting for the programs in this book. However, if you stick with programming in Java, your programs will eventually run into the bottleneck that arises from not controlling your garbage collection settings. At that point, I want you to come back to this chapter and cover the basics. Eventually, you may need more advanced information, but this set of basic information will let you solve a lot of garbage collection problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.122.82