Garbage Collection

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Garbage Collection

Many programming languages support heap-based memory allocation. All objects in Java are allocated on the heap; no objects are ever allocated on the stack (an additional special-purpose storage data structure). Different pieces of storage can be allocated from and returned to the heap in no particular order, leading to problems of fragmentation—you have enough total storage free to satisfy a large allocation request, but it is not in the usable form of one contiguous chunk.

Solving the fragmentation problem

Heap fragmentation is resolved by the run-time system periodically reorganizing the heap, and possibly relocating some live (in-use) objects. All this will be transparent to the application, apart from the time it takes. Most algorithms require the application to stop while the heap is being reorganized.

Languages with dynamic data structures (structures that can grow and shrink in size at run-time) must have some way of telling the underlying run-time when they need more memory. C does this with the malloc() library call. Java does this with the “new” operator.

Conversely, you also need some way to indicate memory that is no longer in use (e.g., threads that have terminated, objects that are no longer referenced by anything, variables that have gone out of scope, etc.) and hand it back to the run-time system for reuse. C and C++ require explicit deallocation of memory; C does this with the free() library call, C++ uses delete(). The programmer has to say what memory (objects) to give back to the run-time system, and when. In practice, this has turned out to be an error-prone task. It's all too easy to create a “memory leak” by not freeing memory before overwriting the last pointer to it. It can then neither be referenced nor freed, and is lost to further use for as long as the program runs. The process address space grows bigger and bigger, and the process gets slower and slower as it is swapped out to make room for other tasks. Java takes a different approach to reclaiming memory.

To avoid the problems of explicit memory management, Java takes the burden off the shoulders of the programmer and puts it on the run-time storage manager. One subsystem of the storage manager is the “garbage collector.” The automatic reclaiming of memory that is no longer in use is known as “garbage collection” in computer science. Java has a thread that runs in the background whose task is to do garbage collection. It looks at memory, and when it finds objects that are no longer referenced, it reclaims them by telling the heap that memory is available to be reallocated.

The costs and benefits of garbage collection

Taking away the task of memory management from the programmer gives him or her one less thing to worry about, and makes the resulting software much more reliable in use. It may take a little longer to run compared with a language like C++ with explicit memory management, because the garbage collector has to go out and look for reclaimable memory rather than simply being told where to find it. On the other hand, it's much quicker to debug your programs and get them running in the first place. Most people would agree that in the presence of ever-improving hardware performance, a small performance overhead is an acceptable price to pay for more reliable software.

What is the cost of making garbage collection an implicit operation of the run-time system rather than a responsibility of the programmer? It means that at unpredictable times, a potentially large amount of behind-the-scenes processing will suddenly start up when some low-water mark is hit and more memory is called for. This has been a problem with past systems, but Java addresses it somewhat with threads. In a multithreaded system, some of the garbage collector might run in parallel with user code and have a less intrusive effect on the system.

We should mention at this point that there is almost no direct interaction between the programmer and garbage collection. It is one of the run-time services that you can take for granted, like keeping track of return addresses, or identifying the correct handler for an exception. The discussion here is to provide a little more insight into what takes place behind the scenes.

If you want to tell the system that you are done with a data structure and it can be reclaimed, all you do is remove all your references to it, as in:

myBigDataStructure = null;

If there are other references to the data structure, it won't be garbage-collected. But as soon as nothing points to it, it is a candidate for sweeping away.

Finalizers

A “finalizer” is a Java term related to but not the same as a C++ destructor. When there are no further references to an object, its storage can be reclaimed by the garbage collector.

A finalizer is a method from class Object that any class may override. If a class has a finalizer method, it will be called on dead instances of that class before the memory occupied by that object is reused.

Garbage collection algorithms

A number of alternative garbage collection algorithms have been proposed and tried over the years. Three popular ones are “reference counting,” “mark and sweep,” and “stop and copy.”

Reference counting keeps a counter for each chunk of memory allocated. The counter records how many pointers directly point at the chunk or something inside it. The counter needs to be kept up to date as assignments are made. If the reference count ever drops to zero, nothing can ever access the memory and so it can immediately be returned to the pool of free storage. The big advantage of reference counting is that it imposes a steady constant overhead, rather than needing periodic bursts of the CPU. Reference counting has to be a bit more complex so it isn't fooled by circular references. If A points to B, and B points to A, but nothing else points to A and B they will not be freed even though they could be. It's also a little resource-intensive in multithreaded environments because reference counts must be locked for mutual exclusion before reference counts are updated.
Mark and sweep is the garbage collection algorithm used by the current JDK. The marker starts at the root pointers. Root pointers are things like references to all threads, stacks and static (global) variables. You can imagine marking with a red pen every object that can be accessed from the roots. Then the marker recursively marks all the objects that are directly or indirectly referenced from the objects reachable from the roots. The algorithm continues until no more red marks can be placed. The entire virtual process may need to be swapped in and looked at, which is expensive in disk traffic and time. A smart garbage collector knows it doesn't have to bring in objects that can't contain references, like large graphics images and the like. Then the “sweep” phase starts, and everything without a red mark is swept back onto the free list for reuse. Memory compaction also takes place at this point. Memory compaction means jiggling down into one place all the memory that is in use, so that all the free store comes together and can be merged into one large pool. Compaction helps when you have a number of large objects to allocate.
Stop and copy is a third garbage collection algorithm. As the name suggests, it halts all other threads and goes into a garbage collection phase. The heap is split into two parts: the currently active part and the new part. Each of these is known as a “semi-space.” Non-garbage is identified by tracing active pointers, just as in mark and sweep. It copies all the non-garbage stuff over into the new semi-space and makes that the currently active semi-space. The old currently active semi-space is just discarded completely. The advantage of “stop and copy” is that it avoids heap fragmentation, so periodic memory compaction is not needed. Stop and copy is a fast garbage collection algorithm, but it requires twice the memory area. It also can't be used in real-time systems, as it makes your computer appear to just freeze from time to time.

The Java Language Specification says this on the topic:

The purpose of finalizers is to provide a chance to free up resources (such as file descriptors or operating system graphics contexts) that are owned by objects but cannot be accessed directly and cannot be freed automatically by the automatic storage management. Simply reclaiming an object's memory by garbage collection would not guarantee that these resources would be reclaimed.

You don't need this in your code 99% of the time. The run-time library does control several resources like this, such as graphics contexts. They usually come with a method called dispose(), which you call to tell the run-time to give the resource back to the OS.

Interpose a finalizer by providing a body for the method finalize() in your class to override the Object version. It will look like this:

class Fruit {

        protected void finalize() throws Throwable {
                // do finalization
        }

It must have the signature shown (also be protected and return void). If present, a class's finalizer is called by the garbage collector at some point after the object is first recognized as garbage and before the memory is reclaimed, such that the object is garbage at the time of the call. A finalizer can also be called explicitly. There is no guarantee that an object will be garbage collected, and hence there is no guarantee that an object's finalizer will be called. A program may terminate normally without garbage collection taking place. So you could not rely on a finalizer method being called, and you cannot use it to carry out some essential final housekeeping (release a lock, write usage statistics, or whatever).

Finally (uh…), don't confuse “final” (a constant) or “finally” (a block that is always executed after a “try{}”) with “finalize”—the three concepts are unrelated.

Weak references

JDK 1.2 brought in the notion of weak references. Weak references allow a program to have a reference to an object that does not prevent the object from being considered for reclamation by the garbage collector. This is an advanced technique that won't appear in your programs much, if at all.

Weak references are useful for building caches that can be flushed if memory gets low. They also permit scheduling post-mortem cleanup actions in a more flexible way than the finalization mechanism. Finally, weak references allow a program to be notified when the collector has determined that an object has become eligible for reclamation.

Let us move on to look at the other great run-time data structure, the stack.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Garbage Collection

Create new playlist

Sign In

Sign Up

Garbage Collection

Solving the fragmentation problem

The costs and benefits of garbage collection

Finalizers

Weak references

Table of Contents for
Garbage Collection