Chapter 9

Generational garbage collection

The goal of a collector is to find dead objects and reclaim the space they occupy. Tracing collectors (and copying collectors in particular) are most efficient if the space they manage contains few live objects. On the other hand, long-lived objects are handled poorly if the collector processes them repeatedly, either marking and sweeping or copying them again and again from one semispace to another. We noted in Chapter 3 that long-lived objects tend to accumulate in the bottom of a heap managed by a mark-compact collector, and that some collectors avoid compacting this dense prefix. While this eliminates the cost of relocating these objects, they must still be traced and all references they contain must be updated.

Generational collectors extend this idea by not considering the oldest objects whenever possible. By concentrating reclamation effort on the youngest objects in order to exploit the weak generational hypothesis that most objects die young, they hope to maximise yield (recovered space) while minimising effort. Generational collectors segregate objects by age into generations, typically physically distinct areas of the heap. Younger generations are collected in preference to older ones, and objects that survive long enough are promoted (or tenured) from the generation being collected to an older one.

Most generational collectors manage younger generations by copying. If, as expected, few objects are live in the generation being collected, then the mark/cons ratio between the volume of data processed by the collector and the volume allocated for that collection will be low. The time taken to collect the youngest generation (or nursery) will in general depend on its size. By tuning its size, we can control the expected pause times for collection of a generation. Young generation pause times for a well configured collector (running an application that conforms to the weak generational hypothesis) are typically of the order of ten milliseconds on current hardware. Provided the interval between collections is sufficient, such a collector will be unobtrusive to many applications.

Occasionally a generational collector must collect the whole heap, for example when the allocator runs out of space and the collector estimates that insufficient space would be recovered by collecting only the younger generations. Generational collection therefore improves only expected pause times, not the worst case. On its own, it is not sufficient for real-time systems. We consider the requirements for garbage collection in a hard real-time environment and how to achieve them in Chapter 19.

Generational collection can also improve throughput by avoiding repeatedly processing long-lived objects. However, there are costs to pay. Any garbage in an old generation cannot be reclaimed by collection of younger generations: collection of long-lived objects that become garbage is not prompt. In order to be able to collect one generation without collecting others, generational collectors impose a bookkeeping overhead on mutators in order to track references that span generations, an overhead hoped to be small compared to the benefits of generational collection. Tuning generational collectors to meet throughput and pause-time goals simultaneously is a subtle art.

Image

Figure 9.1: Inter-generational pointers. If live objects in the young generation are to be preserved without tracing the whole heap, a mechanism and a data structure are needed to remember objects S and U in the old generation that hold references to objects in the young generation.

9.1    Example

Figure 9.1 shows a simple example of generational collection. This collector is using two generations. Objects are created in the young generation. At each minor collection (or nursery collection), objects in the young generation are promoted to the old generation if they are sufficiently old. Before the first collection, the young generation in this example contains four objects, N, P, V and Q, and the old generation three objects, R, S and U. R and N are reachable from outside the generational space; maybe some roots point to them. The collector is about to collect the young generation. Suppose that N, P and V were allocated some time ago but Q was created only shortly before the collector was invoked. The question of which objects should be promoted raises important issues.

A generational collector will promote objects it discovers from the young generation to the old one, provided they are old enough. This decision requires that a generational collector has a way of measuring time and a mechanism for recording ages. In our example, no objects in the young generation other than N are directly reachable from the roots, but P and Q are also clearly live since they are reachable from the roots via R and S. Most generational collectors do not examine the whole heap, but trace only the generation(s) being collected. Since the old generation is not to be traced here, a generational system must record inter-generational pointers such as the one from S to P in order that the collector may discover P and Q.

Such inter-generational pointers can arise in two ways. First, the mutator creates a pointer that requires tracking whenever it writes a pointer to an object in a generation G1 into a field of an object in a generation G2 that will be collected later than G1. Second, the collector itself may create inter-generational pointers when it promotes an object. In the example, the collector will create such a pointer if it promotes P but not Q. In both cases, the inter-generational pointer can be detected with a write barrier. The mutator requires a barrier on pointer writes that records whether a pointer between generations has been written. A generational collector needs a similar copy write barrier to detect any inter-generational references created by promotion. In the example, the remembered set (remset) records the location of any objects (or fields) that may contain an inter-generational pointer of interest to the garbage collector, in this case S and U.

Unfortunately, treating the source of inter-generational pointers as roots for a minor collection exacerbates the problem of floating garbage. Minor collections are frequent but do not reclaim garbage in the old generation, such as U. Worse, U holds an inter-generational pointer so must be considered a root for the young generation. This nepotism will lead to the young garbage child V of the old garbage object being promoted rather than reclaimed, thus further reducing the space available for live objects in the older generation.

9.2    Measuring time

Before objects can be segregated by their age, we need to decide how time is to be measured. There are two choices: bytes allocated or seconds elapsed. Wall-clock time is useful for understanding a system’s external behaviour. How long does it run? What are the pause times for garbage collection and how are they distributed? Answers to these questions determine whether a system is fit for purpose: will it complete its task in sufficient time and will it be sufficiently responsive? One requirement might be that it does not disturb an interactive human user. Another requirement might be to meet a hard realtime guarantee (say, in an embedded system) or a soft one (where occasionally missing a deadline is not disastrous but missing many is). On the other hand, internally, object lifetimes are better measured by the number of bytes of heap space allocated between their birth and their death. Space allocated is a largely machine-independent measure, although clearly a system with 64-bit addresses or integers will use more space than a 32-bit one. Bytes-allocated also directly measures the pressure placed upon the memory manager; it is closely related to the frequency with which the collector must be called.

Unfortunately measuring time in terms of bytes allocated is tricky in multithreaded systems (where there are multiple application or system threads). A simple global measure of the volume of allocation may inflate the lifetime of an object, since the counter will include allocation by threads unrelated to the object in question [Jones and Ryder, 2008].

In practice generational collectors often measure time in terms of how many collections an object has survived, because this is more convenient to record and requires fewer bits, but the collections survived is appropriately considered to be an approximate proxy for actual age in terms of bytes allocated.

9.3    Generational hypotheses

The weak generational hypothesis, that most objects die young, appears to be widely valid, regardless of programming paradigm or implementation language. Foderaro and Fateman [1981] found that, for a computer algebra system written in MacLisp, 98% of the volume of data recovered by a collection had been allocated since the previous one. Zorn [1989] reported that between 50% and 90% of Common Lisp objects survive less than ten kilobytes of allocation. The story is similar for functional languages. For Haskell, between 75% and 95% of heap data died before they were ten kilobytes old and only 5% lived longer than one megabyte [Sansom and Peyton Jones, 1993]. Appel [1992] observed that Standard ML/NJ reclaimed 98% of any given generation at each collection, and Stefanović and Moss [1994] found that only 2% to 8% of allocated data survived the 100 kilobyte threshold.

It also holds for many programs written in object-oriented languages. Ungar [1986] found that less than 7% of Smalltalk objects lived beyond 140 kilobytes. Dieckmann and Hölzle [1999] reported that the volume of Java objects in the SPECjvm98 benchmark suite surviving 100 kilobytes varied between 1% and 40%, and that less than 21% lived beyond one megabyte although the proportion varied significantly from one application to another. Blackburn et al [2006a] found that on average less than 9% of objects allocated by the SPECjvm98 and DaCapo benchmark suites escaped from a four megabyte nursery, although there was wide variation between benchmarks; note that this is an upper bound on the volume of objects living longer than four megabytes since some escapees may have been allocated only shortly before the nursery was collected. Jones and Ryder [2008] found bimodal lifetime distributions common in Java applications; between 65% and 96% of DaCapo objects survived no longer than 64 kilobytes, with 3% to 16% being immortal or living longer than four megabytes. Even in imperative languages without automatic memory management support, the lifetimes of many objects are short. Barrett and Zorn [1993] reported that more than 50% of allocated data died within ten kilobytes and less than 10% survived 32 kilobytes.

On the other hand, there is much less evidence for the strong generational hypothesis that, even for objects that are not newly-created, younger objects will have a lower survival rate than older ones [Hayes, 1991]. Simple models like the weak generational hypothesis account adequately in many programs for the behaviour of objects overall. However, once the shortest lived objects are discounted, objects’ demographics over a longer timescale are more complex. Object lifetimes are not random. They commonly live in clumps and die all at the same time, because programs operate in phases [Dieckmann and Hölzle, 1999; Jones and Ryder, 2008]. A significant number of objects may never die. The lifetime of objects may be correlated with their size, although opinion has differed on this [Caudill and Wirfs-Brock, 1986; Ungar and Jackson, 1988; Barrett and Zorn, 1993]. However, as we saw above, there are other reasons why we might want to treat large objects specially.

9.4    Generations and heap layout

A wide variety of strategies have been used to organise generations. Collectors may use two or more generations, which may be segregated physically or logically. Each generation may be bounded in size or the size of one space may be traded against that of another. The structure within a generation may be flat or it may comprise a number of age-based subspaces, called steps or buckets. Generations may also hold their own large object subspaces. Each generation may be managed by a different algorithm.

The primary goals of generational garbage collection are reduced pause times and improved throughput. Assuming that the youngest generation is processed by copying collection, expected pause times depend largely on the volume of data that survives a minor collection of that generation, which in turn depends on the size of the generation. However, if the size of the nursery is too small, collection will be fast but little memory will be reclaimed as the objects in the nursery will have had insufficient time to die. This will have many undesirable consequences.

First, young generation collections will be too frequent; as well as its copying cost proportional to the volume of surviving objects — which will be higher since object have had less time to die — each collection must also bear the cost of stopping threads and scanning their stacks.

Second, the older generation will fill too fast and then it too will have to be collected. High promotion rates will cause time-consuming older generation or full heap collections to take place too frequently. In addition, premature promotion will increase the incidence of nepotism, as ‘tenured’ garbage objects in the old generation preserve their offspring in the young generation, artificially inflating the survivor rate as those dead children will also be promoted.

Third, there is considerable evidence that newly created objects are modified more frequently than older ones. If these young objects are promoted prematurely, their high mutation rate will put further pressure on the mutator’s write barrier; this is particularly undesirable if the cost of the write barrier is high. Any transfer of overheads between mutator and collector needs careful evaluation with realistic workloads. Typically, the collector will account for a much smaller proportion of execution time than the mutator in any well configured system. For example, suppose a write barrier comprises just a few instructions in its fast path yet accounts for 5% of overall execution time; suppose further that the collector accounts for 10% of overall run time. It would be quite easy for an alternative write barrier implementation to double the cost of the barrier, thus adding 5% to overall execution time. To recover this, garbage collection time must be reduced by 50%, which would be hard to do.

Finally, by promoting objects the program’s working set may be diluted. Generational organisation is a balancing act between keeping minor collections as short as possible, minimising the number of minor and the much more expensive full, major collections, and avoiding passing too much of the cost of memory management to the mutator. We now look at how this can be achieved.

9.5    Multiple generations

Adding further generations is one solution to the dilemma of how to preserve short pause times for nursery collections without incurring excessively frequent full heap collections, because the oldest generation has filled too soon. The role of the intermediate generations is to filter out objects that have survived collection of the youngest generation but do not live much longer. If a collector promotes all live objects en masse from the youngest generation, the survivors will include the most recently allocated objects despite the expectation that most of these will die very soon. By using multiple generations, the size of the youngest generation can be kept small enough to meet expected pause time requirements without increasing the volume of objects dying in the oldest generation shortly after their promotion.

Using multiple generations has a number of drawbacks. Most systems will collect all younger generations when any older generation is collected. This offers the benefit that pointers need to be tracked in one direction only: old to young, which occur less frequently than young to old. Although the time taken to collect an intermediate generation will be less than that required to collect the full heap, pause times will be longer than those for nursery collections. Multiple generation collectors are also more complex to implement and may introduce additional overheads to the collector’s tracing loop, as this performance critical code must now distinguish between multiple generations rather than just two (which can often be accomplished with a single check against an address, maybe a compile-time constant). Increasing the number of generations will tend to increase the number of inter-generational pointers created, which in turn may increase the pressure on the mutator’s write barrier, depending on implementation. It will also increase the size of the root set for younger generations since objects have been promoted that would not have been if some of the space used for the intermediate generations had been used to increase the size of the young generation.

Although many early generational collectors for Smalltalk and Lisp offered multiple generations, most modern generational collectors for object-oriented systems provide just two. Even where collectors provide more than two generations, such as those for functional languages where allocation and death rates are prodigiously high, often only two generations are used by default [Marlow et al, 2008]. Instead, mechanisms within generations, especially the youngest generation, can be used to control promotion rates.

9.6    Age recording

En masse promotion

Age recording and promotion policy are tightly coupled. Multiple generations provide an imprecise way of recording objects’ ages. Figure 9.2 shows four ways in which a young generation can be structured to control object promotion. We discuss each of these in turn. The simplest arrangement is for each generation except the oldest to be implemented as a single semispace (see Figure 9.2a). When that generation is collected all surviving objects are promoted en masse to the next. This structure has the advantages of simplicity and optimal utilisation of the memory devoted to the young generation. There is neither any need to record per-object ages nor is there any necessity for copy reserve space in each generation (except for the last if indeed it is managed by copying). The generational collectors used by the MMTk memory manager in the Jikes RVM Java virtual machine use en masse promotion in this way [Blackburn et al, 2004b]. However, Zorn [1993] has suggested that en masse promotion of every live object (in a Lisp system) may lead to promotion rates 50% to 100% higher than can be achieved by requiring objects to survive more than one minor collection before they are promoted.

Figure 9.3, due to Wilson and Moher [1989b], illustrates the survival rates from the youngest generation that might be obtained by delaying promotion for one or two collections. The curves show the fraction of objects that survive a future scavenge if they were allocated at time t, assuming that most objects die young. The closer an object is born to a collection the more likely it is to survive that collection. Let us focus on the area of the graph between scavenges n and n + 1. Curve (b) shows the proportion of objects that will survive one scavenge: most objects will die without being collected: these are the data in the light grey area. The data in the black area below curve (c) will survive two scavenges. If the policy is to promote en masse all objects that survive the collection, then the data in the dark grey and black areas below curve (b) will be promoted. However, if promotion is restricted to those objects that survive two collections, then only the data in the black area below curve (c) will be tenured. By requiring a copy count greater than one, the very youngest objects (which we can expect to die soon) are denied tenure, and the promotion rate is substantially reduced. In general, increasing the copy count for promotion beyond two is likely to pay diminishing returns [Ungar, 1984; Shaw, 1988; Ungar and Jackson, 1988]; Wilson [1989] suggests that it may be necessary to increase the count by a factor of four or more to reduce the number of remaining survivors by half.

Aging semispaces

Promotion can be delayed by structuring a generation into two or more aging spaces. This allows objects to be copied between the fromspace and tospace an arbitrary number of times within the generation before they are promoted. In Lieberman and Hewitt’s original generational collector [1983], a generation is collected several times before all survivors are eventually promoted en masse. In terms of the aging semispaces of Figure 9.2b, either all live objects in fromspace are evacuated to tospace within this generation or all are promoted to the next generation, depending on the age of the generation as a whole. While this arrangement allows the older members of the generation time to die, the very youngest will still be promoted, possibly prematurely.

Image

Figure 9.2: Semispace organisation in a generational collector. Dark grey indicates newly allocated data, light grey copied or promoted data. In each case, the x-axis is time and the y-axis is the volume of allocation. (a) en masse promotion; (b) aging semispaces (records per space age); (c) aging semispaces (records per object age); (d) survivor spaces promotion (records per object age).

Jones [1996]. Reprinted by permission.

Image

Figure 9.3: Survival rates with a copy count of 1 or 2. The curves show the fraction of objects that will survive a future collection if they were born at time x. Curve (b) shows the proportion that will survive one collection and curve (c) the proportion that will survive two. The coloured areas show the proportions of objects that will not be copied or will be promoted (copied) under different copy count regimes.

Wilson and Moher [1989b], doi: 10.1145/74877.74882. © 1989 Association for Computing Machinery, Inc. Reprinted by permission.

Sun’s ExactVM1 also implemented the younger of two generations as a pair of semispaces (see Figure 9.2c) but controlled promotion of an individual object by stealing five bits from one of two header words to record its age. In this case, individual live objects can either be evacuated to tospace or promoted to the next generation. While this throttles the promotion of the youngest objects, it adds a test and an addition operation to the work done to process live objects in the young generation.

Bucket brigade and step systems allow a somewhat finer discrimination between object ages without maintaining per-object ages. Here, a generation is divided into a number of subspaces and objects are advanced from one bucket or step to the next at each collection. Some step systems advance all surviving objects from one step to the next at each collection: live objects from the oldest step are promoted to the next generation. Here, an n-step system guarantees that objects will not reach the next generation until they have survived n scavenges. Glasgow Haskell allows an arbitrary number of steps in each generation (although the default is two in the young generation and one in others), as does the UMass GC Toolkit Hudson et al [1991]. Shaw [1988] further divides each step into a pair of semispaces in his bucket brigade scheme. Survivors are copied between each pair of semispaces b times before advancing to the next step. Thus, the two-bucket scheme guarantees that objects will not reach the next generation until they have survived between 2b and 2b – 1 scavenges. Shaw arranged his scheme to simplify promotion. Figure 9.4 shows an instance of his scheme with two buckets: n = 3 so objects are copied up to three times within a bucket before being evacuated to the aging bucket or promoted. Because Shaw’s generations are contiguous, he can merge the aging bucket with the old generation by delaying the promotion step until the oldest bucket’s tospace is adjacent to the old generation. At this point the bucket is promoted by adjusting the boundary between the generations. The aging spaces of Figure 9.2c have some similarities with a two-bucket scheme but pay the cost of manipulating age bits in the headers of survivors.

Image

Figure 9.4: Shaw’s bucket brigade system. Objects are copied within the young generation from a creation space to an aging semispace. By placing the aging semispace adjacent to the old generation at even numbered collections, objects can be promoted to the old generation simply by moving the boundary between generations.

Jones [1996]. Reprinted by permission.

It is important to understand the differences between steps and generations. Both segregate objects by age, but different generations are collected at different frequencies whereas all the steps of a generation are collected at the same time. Generations also differ in how pointers that span spaces are concerned. Because one generation may be collected later than another it is essential to track pointers from an older generation to a younger one. On the other hand, there is no need to track inter-step pointers. By using steps in the youngest generation (where most mutation occurs), and by reducing premature collection, the load on the write barrier can be reduced while also controlling promotion, without need for per-object age records.

Survivor spaces and flexibility

All the semispace organisations described above are wasteful of space since they reserve half the space in the generation for copying. Ungar [1984] organised the young generation as one large creation space (sometimes called eden) and two smaller buckets or survivor semispaces (see Figure 9.2d). As usual, objects are allocated in eden, which is scavenged alongside the survivor fromspace at each minor collection. All live eden objects are promoted to the survivor tospace. Live objects in survivor fromspace are either evacuated to tospace within the young generation or promoted to the next generation, depending on their age. This organisation can improve space utilisation because the eden region is very much larger than the two semispaces. For example, Sun’s HotSpot Java virtual machine [Sun Microsystems, 2006] has a default eden versus survivor space ratio of 32:1, thus using a copy reserve of less than 3% of the young generation.2 HotSpot’s promotion policy does not impose a fixed age limit for promotion but instead attempts to keep the survivor space half empty. In contrast, the other semispace schemes waste half of the nursery space on copy reserve.

Image

Figure 9.5: High water marks. Objects are copied from a fixed creation space to an aging semispace within a younger generation and then promoted to an older generation. Although all survivors in an aging semispace are promoted, by adjusting a ‘high water mark’, we can choose to copy or promote an object in the creation space simply through an address comparison.

Wilson and Moher [1989b], doi: 10.1145/74877.74882. © 1989 Association for Computing Machinery, Inc. Reprinted by permission.

The Opportunistic garbage collector [Wilson and Moher, 1989b] used a bucket brigade system with the space parsimony of survivor spaces and some flexibility in promotion age. The age at which objects are promoted can be varied down to the granularity of an individual object without the overhead of having to store or manipulate each object’s age. As before, the young generation is divided into a creation space and a pair of aging spaces. The aging spaces are not semispaces but simply act as steps. At each minor collection, survivors from the creation space are evacuated to one of the aging spaces; survivors of the other aging space are promoted. With just this organisation, promoted objects would have a copy count of two. However, Wilson and Moher observe that objects are placed in the creation space in allocation order. By drawing a high water mark across creation space, younger objects (above the line in Figure 9.5) can be distinguished from older ones by a simple address comparison. Younger members of the creation space are treated as members of bucket 0. Older members and all of the aging space become bucket 1; survivors of this bucket are promoted.

This scheme limits the promotion age to a maximum of two minor collections, and so does not offer as wide a range of promotion age as those that explicitly store ages in objects or associate them with spaces (such as the semispace organisations we considered earlier). However, any non-integral promotion threshold between one and two can be selected, and modified at any time, including during scavenges. We can see the effect in Figure 9.3. Any data in the dark grey or black regions to the left of the dashed white high water mark line will be promoted at their first collection. Those to the right of the high water mark line will be promoted if they are in the black area below curve (c), or evacuated to die later in the aging space if they are in the grey area above the curve. Wilson and Moher used this scheme with three generations for the byte-coded Scheme-48; it was also used in Standard ML with up to 14 generations [Reppy, 1993].

9.7    Adapting to program behaviour

The Opportunistic collector is an example of a garbage collector that can adapt its promotion policy as a program runs. It provides a particularly fine-grained and simple mechanism. Adaptation is needed because objects’ lifetime demographics are neither random nor stationary. Instead real programs (unlike toy ones or synthetic benchmarks) tend to operate in phases. There are a wide range of common patterns of behaviour. A set of live objects may gradually accumulate and then die all at once. Alternatively, its size may reach a plateau after which the objects live for a long time. Ungar and Jackson [1988] cite the example of objects born in a clump that slowly diminishes, ‘rather like a pig that has been swallowed by a python’. Demographics that do not adhere strictly to the weak generational hypothesis can cause problems for generational collectors. If a large volume of data lives for sufficient time to reach an older generation and then dies, performance will suffer. To deal with this, Ungar and Jackson have argued for flexible mechanisms that control tenuring [1988; 1992].

It is useful to be able to adapt garbage collectors in general to the mutator’s behaviour, for example to reduce expected pause time or to improve overall throughput. The simplest scheduling mechanism is to invoke the collector only when the allocator runs out of space, but a generational memory manager can control pause times by adjusting the size of the youngest generation: smaller nurseries reduce the volume of objects that must be scavenged by young generation collection. The size of the space also affects the rate of promotion from one generation to another. If a space is too small to give young objects sufficient time to die, then the promotion rate will be higher. Conversely, if the nursery is very large, the interval between collections will be greater and a smaller fraction of objects will survive to reach the older generation.

Appel-style garbage collection

Appel [1989a] introduced an adaptive generational layout for Standard ML that gives as much room as possible to the young generation for a given memory budget, rather than using fixed size spaces. This scheme is designed for environments where infant mortality is high: typically only 2% of ML’s young generation survived a collection. The heap is divided into three regions: the old generation, a copy reserve, and the young generation (see Figure 9.6a). Nursery collections promote all young survivors en masse to the end of the old generation (Figure 9.6b). After the collection, any space not needed for old generation objects is split equally to create the copy reserve and a new young generation. If the space allocatable to the young generation falls below some threshold, the full heap is collected.

Image

Figure 9.6: Appel’s simple generational collector. Grey areas are empty.

As in any scheme managed by copying, Appel must ensure that the copy reserve is sufficient to accommodate the worst case, that all old and young objects are live. The most conservative way is to ensure that old + youngreserve. However, Appel can initiate full heap collections less frequently, requiring only that oldreserve ˄ youngreserve for safety, arguing as follows. Before a minor collection, the reserve is sufficient even if all young objects survive. Immediately after a minor collection, all newly promoted objects in old′ are live: they do not need to be moved. The reserve is sufficient to accommodate all previously promoted objects in old (Figure 9.6c). Following the scavenge of old, all surviving data (now at the top of the heap) can be block moved to the bottom of the heap. We note that in this collect-twice approach any cycle of dead objects that lies partly in the nursery and partly in the old generation will be preserved. However, it will be collected during the next full collection since it is then contained entirely in the old generation.

The entire generational universe in Appel was contiguous, but Appel-style collectors can also be implemented in block structured heaps, which avoids the necessity of sliding the live data to the start of the heap after a major collection. Shrinking nurseries can also be used in conjunction with an old generation managed by a non-moving algorithm, such as mark-sweep.

The advantage of Appel-style collection is that by dynamically adapting the size of the copy reserve it offers good memory utilisation and reduces the number of collections needed compared with configurations that use en masse promotion and fix the size of the young generation. However, some caution is necessary to avoid thrashing the collector. Benchmarks that have high allocation rates but promote little data from the young generation are common: indeed this was one of the motivations for Appel’s design. This can lead to the situation where the space allotted to the nursery shrinks to become so small that it leads to overly frequent minor collections but never enough data is promoted to trigger a major collection. To combat this, the old generation should be collected whenever the young generation’s size falls below a minimum.

Feedback controlled promotion

Other schemes for controlling promotion rate are more directly related to pause time goals. Demographic feedback-mediated tenuring [Ungar and Jackson, 1988, 1992] attempts to smooth out long pauses incurred by bursts of promotion of objects that die soon after promotion. The volume of objects promoted at one collection is used as a predictor for the length of the next collection, and to throttle or accelerate promotion. The excess of survivors above a desired maximum becomes an index into a table indicating the age threshold for promotion that should be used at the next collection. Although this mechanism can control promotion rates, it cannot demote objects from an older to a younger generation. Barrett and Zorn [1995] vary a threatening boundary between two generations in both directions. The cost is that they must track more pointers as they cannot predict where the inter-generational boundary will lie.

In version 1.5.0, Sun’s HotSpot family of collectors introduced Ergonomics, an adaptive mechanism for resizing generations based on user provided goals. Ergonomics focuses on three soft goals rather than attempting to provide hard real time guarantees. It first attempts to meet a maximum pause time goal. Once that is met, it targets throughput (measured as the fraction of overall time spent in garbage collection) and finally, once other goals are satisfied, it shrinks the footprint. Pause time goals are addressed by shrinking the size of generations, one at a time, starting with the one whose pause time is longest, based on statistics acquired at each collection. Throughput is improved by increasing the size of the heap and the generations, the latter in proportion to the time taken to collect each generation. By default, sizes are increased more aggressively than they are decreased.

Vengerov [2009] offers an analytical model for the throughput of HotSpot. From this model he derives a practical algorithm for tuning the collector by adjusting the relative sizes of HotSpot’s two generations and the promotion threshold, the number of collections that a young object must survive before it is promoted. He makes an important observation that it is insufficient to consider whether to adjust the promotion threshold simply on the basis of whether it would reduce the number of objects promoted. Instead, it is essential also to consider the ratio of free space in the old generation after a major collection to the volume promoted into it at each minor collection. His ThruMax algorithm provides a co-evolutionary framework for alternately adjusting the size of the young generation and the promotion threshold. In brief, ThruMax is invoked after the first major collection and once the volume of data in HotSpot’s survivor spaces reaches a steady state (between 75% and 90% of the young generation’s survivor space for two consecutive minor collections). ThruMax first increases the nursery size S until it reaches the neighbourhood of an optimum value (discovered by observing that S has been decreased and so it is probably oscillating around this value). Then ThruMax adjusts the tenuring threshold until the model shows that a further change would decrease throughput. After this, a new episode of adjustments is begun provided that there is no pressure to decrease S and sufficient minor collections are expected before the next major collection.

Overall, sophisticated collectors like HotSpot present the user with a large number of tuning knobs, each of which is likely to be interdependent.

9.8    Inter-generational pointers

A generation’s roots must be discovered before it can be collected. As we saw in the example in Figure 9.1, the roots for a generation consist not only of pointer values held in registers, stacks and globals but also any references to objects in this generation from objects in other parts of the heap that are not being collected at the same time. These typically include older generations and spaces outside the generational heap, such as large object spaces and spaces that are never collected, including those for immortal objects and possibly code. As we noted above, inter-generational pointers are created in just three ways, by initialising writes as an object is created, by other mutator updates to pointer slots and when objects are moved to different generations. In general such pointers must be detected as they are created and recorded so that they can be used as roots when a generation is collected. We shall call any pointer that must be recorded an interesting pointer.

An interesting case of objects outside the generational heap are objects in the boot image: those objects present when the system starts. A generational system can handle them in at least three ways: it can trace through the boot image objects, which has the benefit of not retaining objects reachable only from boot image objects that have become unreachable; it can scan the boot image objects to find references from them into the generational heap; or it can remember the interesting pointers that reside in boot image objects. Tracing can be expensive, and might be applied only during full collections. Thus it would be applied in conjunction with scanning or remembered sets. Scanning has the virtue of not requiring a write barrier on updates to boot image objects, but the down side that the collector must consider more field to find the interesting pointers. If used in conjunction with tracing, then after a trace the collector should zero the fields of unreachable boot image objects, to prevent misinterpretation of pointers that may refer to old garbage now reclaimed. Remembered sets have their usual virtues and costs, and also do not require zeroing of unreachable boot image objects’ fields.

Remembered sets

The data structures used to record inter-generational pointers are called remembered sets.3 Remembered sets record the location of possible sources of pointers (for example, U and the second slot of S in the example) from one space of the heap to another. The source rather than the target of an interesting pointer is recorded for two reasons. It allows a moving collector to update the source field with the new address of an object that has been copied or promoted. A source field may be updated more than once between successive collections, so remembering the source ensures that the collector processes only the object that is referenced by the field at the time of the collection, and not the targets of any obsolete pointers. Thus, the remembered set for any generation holds those locations at which a potentially interesting pointer to an object in this generation has been stored. Remembered set implementations vary in the precision with which they record these locations. The choice of precision is a trade-off between overhead on the mutator, space for the remembered sets and the collector’s cost of processing them. Note that the term remembered ‘set’ is sometimes a misnomer because an implementation may allow duplicate entries (and hence be a multiset).

Clearly it is important to detect and record as few pointers as possible. Pointer writes by the collector as it moves objects are easily identified. Pointer stores by the mutator can be detected by a software write barrier, emitted by the compiler before each pointer store. This may not be possible if an uncooperative compiler is used. In this case, the locations where writes have occurred can often be determined from the operating system’s virtual memory manager.

The prevalence of pointer stores will vary between different programming languages and their implementations. From a static analysis of a suite of SPUR Lisp programs, Zorn [1990] found the frequency of pointer stores to be 13% to 15%, although Appel found a lower static frequency of 3% for Lisp [1987] and a dynamic, run-time frequency of 1% for ML [1989a]. State-based languages can be expected to have a higher incidence of destructive pointer writes. Java programs vary widely in terms of the frequency of pointer stores: for example, Dieckmann and Hölzle [1999] found that between 6% and 70% of heap accesses were stores (the latter was an outlier, the next highest was 46%).

Pointer direction

Fortunately, not all stores need to be detected or recorded. Some languages (such as implementations of ML) store procedure activation records in the heap. If these frames are scanned as part of the root set at every collection, the pointer slots they contain can be discovered by the techniques we discuss later in Chapter 11. If stack writes can be identified as such by the compiler, then no barrier need be emitted on writes to these stack frames. Furthermore, many stores will refer to objects in the same partition. Although such stores will probably be detected, the pointers are not interesting from a generational point of view, and need not be recorded.

If we impose a discipline on the order in which generations are collected then the number of inter-generational pointers that need to be recorded can be reduced further. By guaranteeing that younger generations will be collected whenever an older one is, young-to-old pointers need not be recorded (for example, the pointer in N in Figure 9.1). Many pointer writes are initialising stores to newly created objects — Zorn [1990] estimated that 90% to 95% of Lisp pointer stores were initialising (and that of the remaining non-initialising stores two-thirds were to objects in the young generation). By definition, these pointers must refer to older objects. Unfortunately, many languages separate the allocation of objects from the initialisation of their fields, making it hard to separate the non-initialising stores that may create old-young pointers. Other languages provide more support for the compiler to identify pointer stores that do not require a write barrier. For example, the majority of pointer writes in a pure, lazy functional language like Haskell will refer to older objects; old-new pointers can arise only when a thunk (a function applied to its arguments) is evaluated and overwritten with a pointer value. ML, a strict language with side-effects, requires the programmer to annotate mutable variables explicitly; writes to these objects are the only source of old-to-young references.

Object-oriented languages like Java present a more complex scene. Here the programming paradigm centres on updating objects’ states, which naturally leads to old-young pointers being more frequent. Nevertheless, many programmers write in a somewhat functional style, eschewing side effects, and for many applications the overwhelming majority of pointer stores will be to initialise objects in the young generation. However, Blackburn et al [2006a] demonstrate that there is considerable variation in behaviour not only between applications but also within individual ones. Their report strikingly contrasts pointer stores — in terms of their direction and distance (between the time the source and target objects were created) — and pointers discovered in the graph. One cause of these differences is that there may be many writes to the same location: this has implications for how remembered sets are implemented.

Different pointer filtering will be necessary in heaps with multiple independently collected spaces. For example, a collector may apply heuristics to decide which space to scavenge with the intention of prioritising those spaces containing the smallest volume of live objects. In this case, the write barrier must remember pointers in both directions, although if the policy decision is made always to collect the young generation at the same time, we can ignore writes to the nursery (which we expect to be prevalent). Because this design is likely to increase the number of pointers to be remembered, it is best used with an implementation where the size of the remembered set does not depend on the number of pointers remembered. We discuss implementation of write barriers and remembered sets in Chapter 11.

9.9    Space management

Young generations are usually managed by evacuation, either copying surviving objects to a fresh semispace in the same generation or to a space in an older generation. Young generation collections are expected to be frequent and brief, on the assumption of few survivors and hence little to trace. Collections of older generations, on the other hand, are expected to be infrequent, but when they do occur, they are expensive as all younger generations must also be collected unless we are willing to pay the cost of a bidirectional write barrier. Commonly, a collection of the oldest generation will also collect all other spaces in the heap except any immortal spaces or the boot image, although references held in these spaces must be used as roots and may be updated. A full heap collection will not use remembered sets (except for locations in the immortal space or boot image, and even these are unnecessary if those spaces are scanned).

A wider range of strategies can be used to manage the oldest generation. One possibility is semispace copying but this may not be the best choice. It requires a copy reserve of half the generational heap, and so limits the room available for its own fromspace and to younger generations, thus increasing the frequency of collections at all levels. It also leads to long lived data being moved repeatedly. Mark-sweep collection offers better utilisation of memory, especially in small heaps [Blackburn et al, 2004a]. The argument against freelist allocation has been that is slower than sequential allocation and its locality is not so predictable. But this is more a problem for object creation, where allocation rates are high, allocation order provides good spatial locality for young objects [Blackburn et al, 2004a]. The drawback of of mark-sweep collection is that it is non-moving and may eventually degrade as the old generation fragments. The solution is to run an additional compacting pass over the old generation, not necessarily every time but certainly when fragmentation is damaging performance. Compaction can also treat very long lived data specially. As we noted in Chapter 3, these will tend to end up compacted into a ‘dense prefix’ at the bottom of the old generation. The HotSpot mark-compact collector, for example, avoids moving this sediment at the cost of some (user-specified) degree of fragmentation.

Generational collectors almost always distinguish generations by physical segregation. This requires younger generations to be managed by copying collection. A copying collector such as Appel’s conservatively requires copy reserve space equal to the size of generation being collected as all objects may survive in the worst case. However, in practice most objects do not survive a young generation collection.

Better space utilisation can be obtained with a smaller copy reserve and switching from copying to compacting collection whenever the reserve is too small [McGachey and Hosking, 2006]. Here, the collector must be able to switch between copying and marking on the fly because it will only discover that the copy reserve is too small during a collection. Figure 9.7a shows the state of the heap once all survivors have been identified: copied objects are shown in black and the remaining live young objects are marked grey. The next step is to compact the marked objects to one end of the nursery (Figure 9.7b); as usual this takes several passes. Unfortunately compaction will destroy the forwarding addresses left in the black objects in the young generation. McGachey and Hosking solve this problem by requiring the first pass over the grey young generation objects to fix up references to copied objects. Next, they move the marked objects with Jonkers’s sliding compactor (see Section 3.3 in Chapter 3) because this threaded algorithm does not require additional space in object headers. A better solution might be to adapt Compressor for this purpose (discussed in Section 3.4), since it neither requires extra space in object headers nor overwrites any part of live objects. With a copy reserve of 10% of the heap, they gained improvements in performance of 4% on average — but some times up to 20% — over MMTk collectors that manage the old generation by either copying or mark-sweep collection.

Image

Figure 9.7: Switching between copying and marking the young generation. (a) The copy reserve is full. Black objects from the young generation have been copied into the old generation. Grey objects have been marked but not copied. All other new objects are dead. (b) The compaction pass has slid the grey objects into the old generation.

9.10    Older-first garbage collection

Generational garbage collection has proved to be a highly effective way of managing shortlived objects for a wide range of applications. However, as we saw in Section 9.7, longer-lived objects may be more problematic. Generational collectors operate by collecting a youngest prefix of the set of objects in the heap and ignoring other objects. This prefix may be one or more generations depending on whether a collection is a nursery collection, an intermediate collection (in a configuration that uses more than two generations) or a full heap collection. Adaptive techniques that control the promotion of objects can be thought of as ways of varying the age boundary of the young (to be collected) prefix in order to give young objects more time to die. However, generational garbage collection is but one design that avoids collecting the whole heap (we look at schemes outside an age-based context in the next chapter). Possibilities for age-based collection include:

Youngest-only (generational) collection: The collector condemns only the youngest objects in the heap.

Oldest-only collection: Similarly, we could imagine a collector that only considered the oldest objects in the heap, that is, those that have had the longest opportunity to die. However, it is unlikely that such a strategy would be effective as it would spend much of its time repeatedly processing immortal or very long-lived objects. We noted earlier that some collectors deliberately avoid processing this ancient sediment for precisely this reason.

Older-first collection: The collector aims to focus effort on middle-aged objects. It gives the youngest objects sufficient time to die but reduces the time spent considering very long-lived objects (although these are examined from time to time).

Image

Figure 9.8: Renewal Older First garbage collection. At each collection, the objects least recently collected are scavenged and survivors are placed after the youngest objects.

Older-first collection presents two challenges: how to identify those objects considered to be ‘older’ and the increased complexity of managing pointers into the condemned set since interesting pointers may point in either direction (oldest to older, or youngest to older). In the rest of this section we consider two different solutions to these problems.

Renewal Older-First garbage collection. One approach is to consider the ‘age’ of an object to be the time since it was created or last collected, whichever is most recent [Clinger and Hansen, 1997; Hansen, 2000; Hansen and Clinger, 2002]. Renewal Older-First always collects the ‘oldest’ prefix of the heap. To simplify remembered set management, the heap is divided into k equally sized steps. Allocation is always into the lowest-numbered empty step. When the heap is full, the oldest k – j steps (the grey window in Figure 9.8) are condemned, and any survivors are evacuated to a copy reserve at the youngest end of the heap (the black region in the figure). Thus, survivors are ‘re-newed’ and the youngest steps j to 1 are now the oldest. In the figure, the heap advances rightwards through virtual address space. This simplifies the write barrier: only pointers from right to left in the figure, and whose source is an address larger than j, need to be remembered by the mutator. Although this arrangement might be feasible for some programs in a 64-bit address space, it would soon exhaust a 32-bit address space. In this case, Renewal Older-First must renumber all the steps in preparation for the next cycle, and its write barrier must filter pointers by comparing the step numbers of the source and targets; this requires table lookups rather than simple address comparisons. A second potential disadvantage of Renewal Older-First is that it does not preserve the order of objects in the heap by their true ages but irreversibly mixes them. Although Hansen filters out many pointers in the Larceny implementation of Scheme by adding a standard generational nursery (and using Renewal Older-First only to manage the old generation), his remembered sets are large.

Deferred Older-First garbage collection. The alternative does preserve the true age order of objects in the heap [Stefanović, 1999; Stefanović et al, 1999]. Deferred Older-First slides a fixed size collection window (the grey region in Figure 9.9) from the oldest to the youngest end of the heap. When the heap is full the window is collected, ignoring any older or younger objects (the white regions). Any survivors (the black region) are moved to immediately after the oldest region of the heap and any space freed is added to the youngest (rightmost) end of the heap. The next collection window is immediately to the right (younger end) of the survivors. The intuition behind Deferred Older-First is that will seek out a sweet spot in the heap where the collection window finds few survivors. At this point, the collector’s mark/cons ratio will be low and the window will move only very slowly (as in the lower rows of the figure). However, at some point the window will reach the youngest end of the heap, where the collector must reset it to the oldest end of the heap. Although objects are stored in true-age order, Deferred Older-First requires a more complicated write barrier. The mutator’s write barrier must remember all pointers from the oldest region into either the collection window or the youngest region and all young-old pointers (except those whose source is in the condemned window). Similarly, the collector’s copy write barrier must remember all pointers from survivors to other regions and all young survivor-old survivor pointers. Once again, Deferred Older-First collectors typically divide the heap into blocks; they associate a ‘time of death’ with each block (ensuring that blocks older than the GC window have a higher time of death than younger ones). Barriers can be implemented through block time-of-death comparisons and care is needed to handle time of death overflow [Stefanović et al, 2002].

Image

Figure 9.9: Deferred Older First garbage collection. A middle-aged window of the heap is selected for collection. Survivors are placed after the survivors of the previous collection. The goal is that the collector will discover a sweet spot, where the survival rate is very low and the window advances very slowly.

Although Deferred Older-First was found to improve over other generational schemes on maximum pause time, like Renewal Older-First it too needed to track more pointers. It appears that in smaller address spaces older-first algorithms have difficulty competing with the best of other schemes because of the cost of the more complex write barrier for remembering in older-first heap layouts. However, in larger address spaces, such as for 64 bits, its write barrier is much simplified and it may be more competitive.

9.11    Beltway

In this chapter we have looked at a wide range of designs for age-based collection. Five key insights have shaped most of these.

•  ‘Most objects die young’: the weak generational hypothesis [Ungar, 1984].

•  As a corollary, generational collectors avoid repeatedly collecting old objects.

•  Response times have been improved by exploiting incrementality. Generational collectors commonly use small nurseries; other techniques such as the Mature Object Space (often called the ‘Train’) collector [Hudson and Moss, 1992] also bound the size of spaces collected.

•  Small nurseries managed by sequential allocators improve data locality [Blackburn et al, 2004a].

•  Objects need sufficient time to die.

The Beltway garbage collection framework [Blackburn et al, 2002] combines all these insights. It can be configured to behave as any other region-based copying collector. The Beltway unit of collection is called an increment. Increments can be grouped into queues, called belts. In Figure 9.10 each row represents a belt with increments shown as ‘trays’ on each belt. Increments on a belt are collected independently first-in, first-out, as also are belts, although typically the increment selected for collection is the oldest non-empty increment on the youngest belt. A promotion policy dictates the destination of objects that survive a collection: they may be copied to another increment on the same belt or they may be promoted to an increment on a higher belt. Note that Beltway is not just another generational collector and belts are not generations. A generational collector would collect all increments on a belt; Beltway collects each increment independently.

Figure 9.10 shows examples of existing and new collectors. A simple semispace collector comprises a single belt with two increments (Figure 9.10a): each increment is half of the heap. All survivors from the first increment (fromspace) on the belt are copied to the second (tospace) increment. Generational collectors use a belt per generation. Fixed-size nursery collectors limit the size of belt 0 increment (Figure 9.10b) whereas Appel-style collectors allow both increments to grow to consume all usable memory (Figure 9.10c). Aging semispaces can be modelled by increasing the number of increments on belt 0 (Figure 9.10d). However, unlike the aging semispace discussed in Section 9.6, this design trades increased space for reduced collection time: unreachable objects in the second increment are not reclaimed in this collection cycle. Renewal Older-First and Deferred Older-First can also be modelled. Figure 9.10e shows clearly how objects of different ages are mixed by Renewal Older-First collectors. Deferred Older-First collectors use two belts, whose roles are flipped when the collection window reaches the youngest end of the first belt. Blackburn et al also used the Beltway framework to introduce new copying collection algorithms. Beltway.X.X (Figure 9.10g) adds incrementality to an Appel-style collector: when belt 1 is full, it collects only the first increment. In this configuration X is the maximum size of the increment as a fraction of usable memory: thus, Beltway.100.100 corresponds to a standard Appel-style generational collector. If X < 100, Beltway.X.X is not guaranteed to be complete since garbage cycles may span belt 1 increments. Beltway.X.X.100 provides completeness by adding a third belt that contains only one increment, which is allowed to grow sufficiently large to hold the whole heap.

Assuming that every configuration collects only oldest increments on youngest belts implies that Beltway’s write barrier needs to remember references from older to younger belts, and younger to older increments on the same belt. If we number belts upwards from 0 (youngest), and increments in each belt in the order in which they are created, an increment can be identified by the pair ⟨b, i⟩ where b is its belt number and i its creation order in belt b. In that numbering a pointer from 〈bi, i〉 to 〈bj, j〉 is interesting if bj<bi(bj=biΛi<j. However, the collector can associate a unique small number ni with each increment i such that a pointer from i to j is interesting exactly when nj < ni. It may need to renumber occasionally, such as when fresh increments are added to belts. A typical implementation breaks up the address space using frames, assigning each increment a disjoint set of frames. In a large address space it may be possible to lay increments out such that direct address comparisons work rather than having to map to increment numbers first, similar to such layouts for older-first algorithms.

Image

Figure 9.10: Beltway can be configured as any copying collector. Each figure shows the increment used for allocation, the increment to be collected and the increment to which survivors will be copied for each configuration.

Blackburn et al [2002], doi: 10.1145/512529.512548.
© 2002 Association for Computing Machinery, Inc. Reprinted by permission.

The performance of Beltway collectors is sensitive to their configuration. The layout of belts in the heap and the implementation of write barriers is crucially important, not only to determine whether pointers need to be remembered but also to decide whether objects need to be copied and if so, to where.

9.12    Analytic support for generational collection

Generational collectors handle short-lived objects well but do not manage longer lived ones optimally. There are two problems. First, collection of garbage in older generations is not prompt: unfortunately there are no solutions as yet that schedule old generation collection as soon as possible after the death of significant volumes of old objects. Second, long-lived objects must be copied from the young generation to the old generation. Collectors that are concerned about objects dying soon after they reach an older generation require objects to have been copied several times before they are promoted. This copying is wasted work: it would be better if long lived objects were directly allocated or pretenured into the generation that they will eventually reach.

Several researchers have tackled this problem by analysing the lifetime distributions of objects allocated by particular points in a program. Sometimes this can be done by the virtual machine implementer who may know that certain virtual machine data structures are permanent, or that certain libraries or code objects cannot or at least are unlikely to be unloaded. Pretenuring of these objects can be baked into the virtual machine.

Researchers have also used profiling to identify longevity. Cheng et al [1998] recorded which allocation sites consistently created objects that were promoted. Blackburn et al [2001; 2007] used lifetime metrics that compared the longevity of objects allocated at a particular program point with some fraction of the program’s largest heap footprint in order to discriminate between short lived, long lived and immortal objects. Both techniques necessitated the time consuming gathering of off-line traces. This information was then used to optimise the code so that new objects were allocated in the most appropriate generation or the immortal space. Some pretenuring decisions may be specific to a single program although Blackburn et al computed generic advice for allocation sites used by all programs (that is, those in the boot image or library code). The effectiveness of such generic advice make the necessary profiling more reasonable.

In contrast, the approach used by Marion et al [2007] is generic, and provides true prediction rather than self-prediction: they obtain pretenuring advice by syntactic comparison of programs’ micro-patterns [Gil and Maman, 2005] against a pre-existing knowledge bank (derived by using machine learning techniques on a large set of program traces to predict lifetimes from micro-patterns). Harris [2000] and Jump et al [2004] obtain modest performance improvements by pretenuring through online sampling. All these approaches obtained most benefit from the identification of those program points which allocated objects that tended to be immortal rather than those that were simply long-lived. Gains for medium lived objects were modest.

Guyer and McKinley [2004] sought to co-locate connected objects, on the basis that they are likely to share similar lifetimes. They combined a compiler analysis, that identifies the object to which a new object might be connected, with a specialised allocator, that places the new object in the same space as the connectee. The analysis is neither required to be sound nor did it rely on a site tending to allocate objects with similar lifetimes. As well as reducing copying and obtaining significant reductions in collection time, co-location also reduced pressure on the write barrier.

Generational collectors for lazy functional languages require write barriers only on updates to suspended computations (or thunks) as all other stores must refer to younger objects. Thunks are updated at most once; all other objects are immutable. In a step-based generational collector, Marlow et al [2008] take advantage of this observation to promote an object eagerly to the same generation or step as an object referring to it: ideally this would be to the oldest from which the target is reachable. Even for mutable objects, no write to a newly created object can be interesting. Zee and Rinard [2002] used a static analysis for Java to eliminate write barriers on these objects, obtaining small improvements in overall execution time for some programs.

9.13    Issues to consider

Generational garbage collection has proved to be a highly effective organisation, offering significant performance improvements for a wide range of applications. By limiting the size of the youngest generation, and concentrating collection effort on that generation, expected pause times can be reduced to a point where they are usually unnoticeable in many environments. This tactic can also increase overall throughput in two ways. First, it reduces the frequency with which long lived data is processed, and thereby not only reduces processing effort but also gives older objects more time to die (so that they need not be traced at all). Second, generational collectors usually allocate young objects sequentially in a nursery area. Sequential allocation obtains cache locality benefits because the memory consumption pattern is predictable and, furthermore, with generational collectors most writes are made to the youngest objects,

Generational collection is not a universal panacea, however. Its effectiveness depends strongly on the lifetime demographics of the application program. The cost of more frequent collections of the nursery and of write barriers must be amortised by obtaining a much better than average pay-back from collecting young generations. If object mortality statistics are not heavily skewed in favour of the young generation — in other words, if the overwhelming majority of objects do not die young — then generational collection will not be an appropriate solution.

Furthermore, generational collection improves only expected pause times; eventually the collector must collect the full heap and generational collection on its own cannot solve the problem of the worst-case pause time, which may be excessive for large heaps. Consequently, generational collection cannot provide the guarantees required for hard real-time collection where deadlines must always be met.

It is simpler to implement generational collection if the collector can move objects in order to segregate young and old objects. Physical segregation not only offers the locality benefits noted above, but can also offer more efficient space tests, needed by the write barrier or while tracing a young generation. Nevertheless, objects can also be segregated virtually, maybe by the value of a bit in their header or in a bitmap.

Generational collectors raise many tuning questions, both for the implementer and for the end user. Not only are there a wide variety of design choices but also any given generational collector needs careful configuration to match a given application. Generational collectors offer many more tuning parameters than the simple choice of heap size.

The first implementation decision is likely to be whether to offer more than two generations. The choice depends largely upon the anticipated lifetime distributions of the applications that the collector is expected to support. If a significant fraction of objects are expected to survive the young generation but to die shortly after promotion to an older generation, then the addition of intermediate generations may be worthwhile. However, in our experience, most systems offer only two generations plus an immortal generation, at least as the default configuration. The problem that the use of multiple generations seeks to solve is that of premature promotion, and there are other ways to deal with this.

In the first place, promotion rate depends on the size of the young generation: larger nurseries allow objects more time to die. Some generational collectors may allow the user to set a fixed size for the youngest generation. Others allow the young generation to expand on demand until it fills all of the heap except that required by other spaces (including the old generation and any necessary reserve for copying). More sophisticated collectors may vary the young generation size in order to meet particular throughput of pause time goals, making resizing decisions based on profiling the collector’s behaviour.

Second, promotion can be limited by controlling the age at which objects are tenured. One approach is en masse promotion in which all survivors of the generation being collected are evacuated to an older generation. This is the simplest promotion to implement, since remembered set for the young generation can be discarded after collection. Alternatively, a collector may require an object to survive more than one collection before being promoted. In this case, we need a mechanism to record object ages. Either some bits in the header of each object in the younger generations must be used to hold its age, or the generation must be divided into subspaces each of which holds objects of a particular age, or both. Common configurations include step-based schemes and eden plus survivor semispaces. In all cases, the subspaces of a generation are collected together.

Finally, it is often possible to avoid having to promote certain objects. Many collectors reserve an immortal space for objects that will survive until the end of the program. Often the objects placed in an immortal area can be recognised either at the time the collector is built or by the compiler. Such objects might include the collector’s own data structures or objects representing the code being executed (assuming that it will not be necessary to unload code).

Promotion rates may also affect the cost of the write barrier and size of remembered sets. Higher rates of promotion may lead to more inter-generational pointers that must be recorded. Whether or not this affects the performance of the write barrier depends on its implementation, a subject considered in more detail in Section 11.8. Write barriers may record pointer writes unconditionally or they may filter out writes of no interest to the collector. The space requirements for card tables are independent of the number of writes recorded, in contrast to remembered sets implemented as sequential store buffers or hash tables.

The frequency with which write barriers are invoked also depends on whether generations can be collected independently. Independent collection requires all inter-generational pointers to be recorded. However, if we are prepared to give up this flexibility in favour of collecting all younger generations whenever an older one is collected, then the write barrier needs to record only old-young pointers, which we can expect to be far fewer. The number of pointers recorded also depends on whether we record the field or the object into which a pointer is written. For card tables, the choice is likely to be irrelevant. However, by noting in the object whether it has already been recorded as a possible source of an inter-generational pointer, we can reduce the size of the remembered set if we use object-remembering rather than field-remembering.

The different mechanisms used by the mutator to record the possible sources of inter-generational pointers affect the cost of collection. Although less precise recording mechanisms may reduce the cost of the write barrier, they are likely to increase the amount of work done by the collector. Field-recording with sequential store buffers may be the most precise mechanism, although the buffer may contain duplicate entries. Both object-recording and card tables require the collector to scan the object or card to find any inter-generational pointers.

In conclusion, generations are but one way of partitioning the heap to improve garbage collection. In the next chapter, we look at other partitioning methods.

9.14    Abstract generational garbage collection

Finally, let us see how the abstract collection framework we introduced in Section 6.6 can be applied to generational collection. Recall that Bacon et al [2004] cast abstract tracing as a form of reference counting, incrementing the count of each object as it is marked. An abstract representation of a conventional, two generation, en masse promotion, nursery collection algorithm is shown in Algorithm 9.1.

For analogy to the previous abstract collection algorithms, this algorithm maintains a multiset I of ‘deferred’ reference count increments to nursery objects. Recall that a remembered set is a set of fields that together include all references from the older generation(s) to the nursery. The multiset I contains exactly the nursery references from those locations, which is why decNursery removes elements from the multiset: a (possibly) remembered slot is being overwritten. The result is that if a nursery object n appears in I then n will be retained by the generational collector. The number of times n appears in I is n’s reference count, not counting references from the nursery or roots. A tracing algorithm summarises in a single mark bit the truth of nI.

Algorithm 9.1: Abstract generational garbage collection: collector routines, mutator routines

 1 atomic collectNursery(I):

 2 rootsNursery(I)

 3 scanNursery(I)

 4 sweepNursery()

 5

 6 scanNursery(W):

 7 while not isEmpty(W)

 8 src ← remove(W)

 9 ρ(src) ← ρ(src) + 1

/* shade src */

10 if ρ(src) = 1

/* src was white, now grey */

11  for each fld in Pointers(src)

12   ref ← *fld

13   if ref in Nursery

14    WW + [ref]

15

16 sweepNursery() :

17 while not isEmpty(Nursery)

18  node ← remove(Nursery)

/* en masse promotion */

19  if ρ(node) = 0

/* node is white */

20   free(node)

21

22 rootsNursery(I):

23 for each fld ∈ Roots

24  ref ← *fld

25  if ref ≠ null and ref ∈ Nursery

26   II + [ref]

27 New():

28 ref ← allocate()

29 if ref = null

30  collectNursery(I)

31  ref ← allocate()

32  if ref = null

33   collect()

/* tracing, counting, or other full–heap GC */

34   ref ← allocate()

35   if ref = null

36    error “Out of memory”

37 ρ(ref) ← 0

/* node is black */

38 Nursery ← Nursery ⋃ {ref}

/* allocate in nursery */

39 return ref

40

41 incNursery(node):

42 if node in Nursery

43  II + [node]

44

45 decNursery(node):

46 if node in Nursery

47  II – [node]

48

49 Write(src, i, ref):

50 if src ≠ Roots and src ∉ Nursery

51  incNursery(ref)

52  decNursery(src[i])

53 src[i] ← ref

When collectNursery is invoked, multiset I is the set of non-zero reference counts, restricted to the nursery, counting only references from old objects. It is the complement of deferred reference counting’s zero count table. After adding references from roots to the nursery (rootsNursery), the nursery is traced from I (scanNursery) and is then swept, removing survivors from Nursery, which implicitly adds them to the older generation, and freeing unreachable nursery objects, that is, those whose abstract reference count is zero. Note that the statement in line 18 performs en masse promotion of all the live nursery objects: it would be straightforward to modify this to model other promotion policies.

1Later called the Sun Microsystems Laboratories Virtual Machine for Research; for details see White and Garthwaite [1998].

2It is interesting to observe the development of hardware and configurations. Ungar [1984] used an eden of just 140 kilobytes with 28 kilobyte survivor spaces, and a 940 kilobyte old generation. HotSpot’s default size for the young generation is 2228 kilobytes on the 32-bit Solaris operating system. We have even heard of a real configuration as extreme as a 3 gigabyte eden, 128 kilobyte survivor spaces and a 512 megabyte old generation.

3Our terminology differs from that of Jones [1996] who distinguished card table schemes from other remembered set implementations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.139.0