Chapter 16

Concurrent mark-sweep

In the previous chapter we looked at the need for incremental or concurrent garbage collection, and identified the problems faced by all such collectors. In this chapter, we consider one family of these collectors: concurrent mark-sweep collectors. As we noted before, the most important issue facing concurrent collection is correctness. The mutator and collector must communicate with each other in order to ensure that they share a coherent view of the heap. This is necessary on the mutator’s part to prevent live objects from being hidden the collector. It is necessary for collectors that move objects to ensure that the mutator uses the correct addresses of moved objects.

The mark-sweep family are the simplest of the concurrent collectors. Because they do not change pointer fields, the mutator can freely read pointers from the heap without needing to be protected from the collector. Thus, there is no inherent need for a read barrier for non-moving collectors. Read barriers are otherwise generally considered too expensive for use in maintaining the strong invariant for a non-moving collector, since heap reads by the mutator are typically much more frequent than writes. For example, Zorn [1990] found that the static frequencies of pointer loads and stores in SPUR Lisp were 13% to 15% and 4%, respectively. He measured the run-time overhead of inlined write barriers as ranging from 2% to 6%, and up to 20% for read barriers. The exception to this general rule is when compiler optimisation techniques can be brought to bear on eliminating redundant barriers [Hosking et al, 1999; Zee and Rinard, 2002], and to folding some of the barrier work into existing overheads for null pointer checks [Bacon et al, 2003a]. For this reason, mark-sweep collectors usually adopt the Dijkstra et al [1976, 1978] incremental update or Steele [1976] insertion write barriers, or their coarser Boehm et al [1991] variant, or the snapshot-at-the-beginning Yuasa [1990] deletion write barrier.

16.1  Initialisation

Instead of allowing the mutator to run until memory is exhausted, concurrent collectors can run even as the mutator is still allocating. However, when to trigger the beginning of a new marking phase is a critical decision. If a collection is triggered too late, it can happen that there will insufficient memory to satisfy some allocation request, at which point the mutator will stall until the collection cycle can complete. Once the collection cycle begins, the collector’s steady-state work-rate must be sufficient to complete the cycle before the mutator exhausts memory, while minimising its impact on mutator throughput. How and when to trigger a garbage collection cycle, ensuring that sufficient memory is available for allocation to keep the mutator satisfied even as concurrent collection proceeds, and reaching termination of the collection cycle so that garbage can be reclaimed and recycled, all depend on scheduling collection work alongside the mutator.

Algorithm 16.1: Mostly-concurrent mark-sweep allocation

 1 New():

 2  collectEnough()

 3  ref ← allocate ()

/* must initialise black if mutator is black */

 4  if ref = null

 5    error "Out of memory"

 6  return ref

 7

 8 atomic collectEnough():

 9  while behind()

10   if not markSome()

11     return

Algorithm 16.1 illustrates the mutator allocation sequence for a concurrent mark-sweep garbage collector that schedules some amount of collector work incrementally at each allocation (piggy-backed on the mutator thread) in the collectEnough procedure. This work is synchronised with other concurrent mutator threads executing mutator barriers, or other collector threads, as indicated by the atomic modifier. The decision as to when and how much collector work to perform is captured by the utility routine behind, which makes sure that the mutator does not get so far ahead of the collector that the allocate routine cannot satisfy the request for new memory.

Algorithm 16.2 shows what happens when collection work is triggered. An empty work list forces initialisation of the collector by scanning the roots to prime the work list. Assuming that scanning the roots also means stopping and scanning all the mutator threads, then at this point no mutator thread holds a white reference. Thus, this example operates in mostly-concurrent mode, with a stop-the-world phase to initiate collection. The now-grey root objects represent the initial scanning wavefront from which tracing proceeds. Having scanned the roots, mutator threads can now continue either as black (since they hold no white references) or grey, depending on the mutator barriers being used.

Stopping the world may result in unacceptable pauses. With grey mutator barriers in place it is possible simply to enable the barriers and defer scanning all the roots until later, concurrently with the mutator. Section 16.5 describes techniques that relax the need to stop the mutator threads in order to sample their roots. Still, at least one root must be scanned to prime the work list and initiate tracing.

16.2  Termination

Termination of the collector cycle for a black mutator is a relatively straightforward procedure. When there are no grey objects in the work list remaining to be scanned then the collector terminates. At this point, even with the weak tricolour invariant the mutator can contain only black references, since there are no white objects reachable from grey objects still held by the mutator (since there are no grey objects). Because the mutator is black there is no need to rescan its roots.

Termination for a grey mutator is a little more complicated, since the mutator may acquire white pointers after its roots were scanned to initiate the collection. Thus, the grey mutator roots must be rescanned before the collector cycle can terminate. Provided that rescanning the mutator roots does not expose any fresh grey objects then the collection cycle is done. Thus, Algorithm 16.2 performs rescanning to ensure there are no more grey references before entering the sweep phase.

Algorithm 16.2: Mostly-concurrent marking

 1 shared worklist ← empty

 2

 3 markSome():

 4  if isEmpty(worklist)

/* initiate collection */

 5    scan(Roots)

/* Invariant: mutator holds no white references */

 6    if isEmpty(worklist)

/* Invariant: no more grey references */

 7      /* marking terminates */

 8      sweep()

/* eager or lazy sweep */

 9      return false

/* terminate marking */

10  /* collection continues */

11  ref ← remove(worklist)

12  scan(ref)

13  return true

/* continue marking, if still behind */

14

15 shade(ref):

16  if not isMarked(ref)

17    setMarked(ref)

18    add(worklist, ref)

19

20 scan(ref):

21  for each fld in Pointers(ref)

22    child ← *fld

23    if child ≠ null

24      shade(child)

25

26 revert(ref):

27  add(worklist, ref)

28

29 isWhite(ref):

30  return not isMarked(ref)

31

32 isGrey(ref):

33  return ref in worklist

34

35 isBlack(ref):

36  return isMarked(ref) && not isGrey(ref)

16.3  Allocation

Notice that the allocator must initialise the mark state (colour) of the new object according to the colour of the mutator. If the mutator is black then new objects must be allocated black (marked) under the strong invariant, unless (under the weak invariant) the new object is also made reachable from some grey object. This last guarantee is generally difficult to make, so black mutators usually allocate black even under the weak invariant [Abraham and Patel, 1987; Yuasa, 1990]. However, a grey mutator admits a number of alternatives that several implementations exploit.

Kung and Song [1977] simply allocate black during the marking phase and white otherwise. Their choice is guided by the observation that new objects are usually immediately linked to existing reachable objects, at which point their write barrier (unconditional Dijkstra-style incremental update) would simply shade the object anyway. Moreover, because the new object contains no references it is safe to allocate straight to black and avoid unnecessary work scanning it for non-existent children.

Steele [1976] chooses to vary the colour of allocation during marking, depending on the pointer values that are used to initialise the new object. Assuming that the initial values of a new object’s reference fields are known a priori at the time of allocation permits a bulk test of the colour of the targets of those references. If none of them are white then the new object can safely be allocated black. Furthermore, if none of them are white then it is a possible sign that the marker is close to terminating and that the new object will not be discarded. Conversely, if any of the initialising pointers is white then the new object is allocated white. The Steele collector marks mutator stacks last, and scans them from bottom (least volatile) to top (most volatile), so most cells will be allocated white to reduce floating garbage.

During sweeping Steele allocates white or black according to where the sweeper is in its pass through the heap. Allocation is white if the free cell being allocated is from space that has already been swept, and black otherwise (to prevent the sweeper from misinterpreting the newly allocated object as free).

One problem with allocating new objects white instead of black is that the new object may over time accumulate a long chain of white objects, which if it remains reachable will eventually need to be traversed before the collection cycle can terminate (consider what happens when a grey mutator allocates a large new data structure white). Allocating black avoids dragging out termination in this way, but at the cost of wasted space since it defers freeing any newly allocated but now unreachable data until the next collection cycle [Boehm et al, 1991; Printezis and Detlefs, 2000]. Vechev et al [2006] propose a compromise solution in which they colour newly allocated objects with a fourth colour: yellow objects are treated as if they are white for purposes of retention (they may yet die before the cycle completes), but black with respect to the tricolour invariant. That is, a yellow object will be shaded straight to black (reachable) without being scanned. Thus, terminating tracing with a grey mutator that holds yellow references means not needing to trace beyond the yellow object.

16.4  Concurrent marking and sweeping

So far we have considered running marking concurrently only with the mutator, with marking and sweeping proceeding in series. Lazy sweeping means that allocation requests by the mutator may also trigger concurrent sweeping from the previous marking phase, even as a new collection cycle is running the next marking phase. This can potentially lead to a confusion about the colours of objects. The trick is to distinguish true garbage white objects from the previous marking phase (needing to be swept) from (as-yet unmarked) white objects in the next marking phase (which may yet be marked). Lamport [1976] pipelines the execution of marking and sweeping phases by introducing a new colour, purple, to distinguish the former from the latter. At the completion of the marking phase, all (garbage) white objects are recoloured purple. Sweeping will collect purple objects, adding them to the free-list (recoloured black or white, depending on allocation colour). Lamport envisions several concurrent marking and sweeping threads, with a collection cycle proceeding as follows.

Image

Table 16.1: Lamport [1976] mark colours: ‘hue and shaded’ encoding of colours for concurrent marking and sweeping. The colour encoding is: white as hue=base/shaded=0, grey as hue=base/shaded=1, black as hue=base+1 and purple as hue=base+2. Note that a garbage (purple) object can never be shaded. When all markers and sweepers have finished there are no grey or purple nodes, so flipping from black to white/grey and white to purple is achieved simply by incrementing base modulo 3.

1.  Wait until all markers and sweepers have finished.

2.  Change all white nodes to purple and all black nodes to white (preferably white to avoid floating garbage) or grey (in the case the node has been shaded concurrently by the mutator write barrier).

3.  Shade all roots.

4.  Start the markers and sweepers.

Marking ignores all purple objects: the mutator can never acquire a reference to a purple object, so grey objects never point to purple and purple objects are never shaded. Of course, the difficulty with this approach is that the conversion of white to purple might require touching colour state associated with all of the garbage objects, which must be completed before sweeping can begin. Similarly, when starting the marking phase, all black objects (from the previous cycle) must be recoloured white.

Lamport describes an elegant solution to this problem in which the colours are reinterpreted at step 2 by rotating through a range of colour values. Each object is tagged with a two-bit basic hue (white, black, purple) plus a one-bit shaded flag. If the hue is white then setting the shaded flag shades the object grey (that is, a shaded white hue is grey). If the hue is black then setting the shaded flag has no effect (that is, black hue means black whether the shaded flag is set or not). If the hue is purple then the shaded flag will never be set since garbage objects will not be traced. The sense of the hue bits is determined by a global variable base encoding the value of white (=base), black (=base + 1) and purple (=base+2). At step 2 there are no grey or purple nodes because marking and sweeping have finished, so flipping from black to white and white to purple is achieved simply by incrementing base modulo 3. Table 16.1 shows the three possible values of base encoded in binary (00, 01, 10) and the two possible values of the shaded flag (0, 1), which together make up the possible colours, along with examples for the three possible values of base. The entries in the ‘value’ columns are determined using arithmetic modulo 3. Note that the combination hue=base + 2/shaded=1 is impossible because purple (garbage) objects are never shaded grey. Subsequent increments cycle the hue interpretation accordingly.

To make sure that step 2 does not leave a node grey from one cycle to the next unless it was recently shaded by a mutator, whenever a marker thread makes a grey node black it must also clear the shaded flag. Otherwise, the grey node will be retained as floating garbage. Also, to speed up the identification of garbage, markers and sweepers can take the opportunity to clear the grey flag whenever they encounter a black object.

Queinnec et al [1989] propose an alternative solution to this problem, using separate colour information for odd and even collection cycles. Thus, marking in one cycle can proceed independently of sweeping from the previous cycle because they operate on independent colour state.

16.5  On-the-fly marking

So far, we have assumed that the mutator threads are all stopped at once so that their roots can be scanned, whether to initiate or terminate marking. Thus, after the initial root scan, the mutator holds no white references. At this point, the mutator threads can be left to run as black (so long as a black mutator barrier is employed), or grey (with a grey mutator barrier) with the proviso that to terminate marking the collector must eventually stop and rescan grey mutators until no more work can be found. These stop-the-world actions reduce concurrency. An alternative is to sample the roots of each mutator thread separately, and concurrently with other mutator threads. This approach introduces complexity because of the need to cope with some threads operating grey and some operating black, all at the same time, and how it affects termination.

On-the-fly collection never stops the mutator threads all at once. Rather, the collector engages each of the mutators in a series of soft handshakes: these do not require a single global hard synchronisation at the command of the collector. Instead, the collector merely prompts each mutator thread asynchronously, one-by-one, to halt gracefully at some convenient point. The collector can then sample (and perhaps modify) each thread’s state (stacks and registers) before releasing it on its way. While one mutator thread is stopped others can continue to run. Furthermore, if stack barriers are used, as described in Section 11.5, the collector can restrict its examination of the stopped thread to just the top active stack frame (all other frames can be captured synchronously with a stack barrier) so the handshake can be very quick, minimising mutator interruption.

Write barriers for on-the-fly collection

Synchronisation operations for on-the-fly collectors need some care. A common approach for mostly-concurrent collectors, which stop all threads together to scan their stacks, is to use a deletion barrier with a black mutator. Furthermore, new objects are allocated black. This approach simplifies the termination of marking: black stacks do not need to be rescanned and allocation does not lead to more work for the collector. However, this approach is not sufficient for an on-the-fly collector, as Figure 16.1 illustrates. Because stacks are scanned on the fly, some may be white. The heap is allowed to contain black objects before all threads have been scanned and before tracing has started because we allocate new objects black. The deletion barrier is not triggered on stack operations and there is no insertion barrier, so neither X nor Y is shaded grey. In summary, correct mutator-collector synchronisation for on-the-fly marking is a subtle issue that requires substantial care on the part of the algorithm designer.

Image

Figure 16.1: On-the-fly collectors that allocate black need more than a deletion barrier to prevent the scenario of a white object reachable only from a black object

Doligez-Leroy-Gonthier

Using soft handshakes to initiate marking was first used in a mark-sweep garbage collector tailored for the ML programming language. Dubbed Doligez-Leroy-Gonthier, after the names of its authors [Doligez and Leroy, 1993; Doligez and Gonthier, 1994], this collector uses private thread-local heaps to allow separate garbage collection of data allocated solely on behalf of a single thread, and not shared with other threads. A global heap allows sharing of objects among threads, with the proviso that the global shared objects never contain pointers into private heaps. A dynamic escape detection mechanism copies private objects into the shared heap whenever their reference is stored outside the private heap. Only immutable objects (the vast majority in ML) can be allocated privately, so making a copy of one in the shared heap does not require updating all the sources of its pointers (though it does require copying the transitive closure of reachable objects). But mutation is rare in ML so this happens infrequently. These rules permit a private heap to be collected independently, stopping only the mutator that owns the heap.

Doligez-Leroy-Gonthier uses concurrent mark-sweep collection in the shared heap, to avoid having to update references from each of the threads. The steady-state concurrent mark-sweep collector operates in the usual black mutator snapshot mode, employing a Yuasa-style snapshot deletion barrier. Initiating steady-state collection proceeds using a series of soft handshakes to transition mutator threads from grey to black, as follows.

The collector and mutator threads each track their own view of the state of the collection with a private status variable. To initiate the collection cycle, the collector sets its status to Sync1. The mutator threads are then made to acknowledge, and update their own status, via soft handshakes. Once all have acknowledged the Sync1 handshake, the collector is said to be in phase Sync1. Mutator threads ignore handshakes while storing to a pointer field or allocating, to ensure that these operations first complete, making them atomic with respect to phase changes. Having acknowledged this handshake, each mutator thread now runs with the write barrier in Algorithm 16.3a, which shades both the old and new values of modified pointer fields, combining both the black mutator Yuasa-style snapshot deletion barrier and the grey mutator Dijkstra-style incremental update insertion barrier. Shading by the mutator does not directly place the shaded object into the collector’s work list for scanning (like Kung and Song [1977]), but rather simply colours a white object explicitly grey and resets a global dirty variable to force the collector to scan for the newly grey object (in the style of Dijkstra et al [1978]). This avoids the need to synchronise explicitly between the mutator and the collector (other than for soft handshakes, where atomicity is accomplished simply by delaying acknowledging the handshake), but does mean that worst-case termination requires rescanning the heap for grey objects. Because mutation is rare in ML, this is not a significant impediment. At this point, the grey mutator threads are still allocating white, as they were before the collection cycle was initiated.

Algorithm 16.3: Doligez-Leroy-Gonthier write barriers. Both ignore the handshake.

(a) The Sync barrier

 1 Writesync(src, i, new):

 2  old ← src[i]

 3  shade(old)

 4  shade(new)

 5  src[i] ← new

(b) The Async barrier

 1 WriteAsync(src, i, new):

 2  old ← src[i]

 3  if not isBlack(old)

 4    shade(old)

 5    if old ≤ scanned

 6        dirty ← true

 7  src[i] ← new

Once all of the mutators have acknowledged the Sync1 handshake the collector moves to phase Sync2 with another round of handshakes. Because the write barrier is atomic only with respect to handshakes, it does not impose mutator-mutator synchronisation. This leaves the possibility that a mutator from before the Sync1 handshake, which is not running the write barrier, could insert some other pointer X into the src[i] field right after the load old←src[i]. Thus, shade(old) pointer will not shade the pointer X that actually gets overwritten by the store src[i]←new. The transition to phase Sync2 avoids such problems by ensuring that all mutator threads have completed any unmonitored atomic allocation or write in Async before transitioning to Sync1. At that point, all mutators will be running the write barrier (with insertion protection), so even if the mutators interleave their write barrier operations there will not be a problem. The collector can then safely move into the steady-state snapshot marking phase, Async. Each mutator thread acknowledges the Async handshake by scanning (shading from) its roots for the collector (making itself black), starting to allocate black, and reverting to the standard snapshot barrier augmented with resetting the global dirty flag (similarly to Dijkstra et al [1978]) to force the collector to rescan if the shaded object is behind the scanning wavefront, shown in Algorithm 16.3b. Table 16.2 shows the phases as observed by collector and mutators.

Once marking finishes, sweeping commences in series. Like Steele [1975], the location of the collector’s sweep pointer determines the mutator allocation colour to minimise floating garbage: white if allocating from memory already swept (so already noted free in this cycle), black if not yet swept (to avoid sweeping it to the free-list), and grey if at the point where the collector is currently sweeping (to avoid the race with the sweeper at the boundary).

Doligez-Leroy-Gonthier for Java

Domani et al [2000] consider Doligez-Leroy-Gonthier-style collection in the context of Java, where they offer several improvements to cope with much higher mutation rates, and language features such as weak references and finalisation. Because Java lacks general support for immutable objects they do not consider independently-collected thread-local heaps, but simply a global shared heap collected on the fly. They also support correct execution on multiprocessors that have a more relaxed memory model than sequential consistency, which was assumed for the original Doligez-Leroy-Gonthier implementation. To avoid rescanning for fresh mutator-shaded grey objects (which are more common in a mutation-oriented language like Java), Domani et al dedicate an output-restricted double-ended queue to each mutator thread, to which it can enqueue grey objects at one end, while the collector is able to poll for work from the other end. This minimises synchronisation between the mutator and collector in the write barriers.

Phase

Collector

Mutators

Meaning

Async

Async

Async

No mutators are executing barriers

Sync1

Async, Sync1

Some mutators may be running the Sync barrier

Sync1

Sync2

Sync1, Sync2

Some mutators may be black and running the Async barrier; others are running the Sync barrier

Sync2

Sync2

Sync2

All mutators are running the Async barrier

Async

Sync2, Async

Some mutators are black, all are running the Async barrier

Async

Async

Async

All mutators are black; the collector can complete marking, scanning and sweeping

Table 16.2: Phases in the Doligez and Gonthier collector

Sliding views

Azatchi et al [2003] offer further improvements to on-the-fly marking by exploiting the sliding views approach to sampling mutator roots without stopping the world [Levanoni and Petrank, 1999]. In place of the deque used by Domani et al [2000], the sliding views approach implements the snapshot deletion barrier by logging to a thread-local buffer the state of all the fields of an object before it is modified (dirtied) for the first time while the collector is marking. The buffers are drained via soft handshakes, with marking terminated once all the buffers are empty. Like Doligez-Leroy-Gonthier, after the initial handshake, and before the deletion barrier can be enabled for each mutator, the mutators also execute a Dijkstra-style incremental update insertion barrier to avoid propagating pointers unnoticed before the mutator snapshot can be gathered. These snooped stores also become mutator roots. The snooped stores are disabled once all threads are known to be logging the snapshot. Further details of this approach are discussed in Section 18.5.

16.6  Abstract concurrent collection

Concurrent collectors have many common design features and mechanisms, while differing in small but important details. To highlight these similarities and differences we can adopt a common abstract framework for concurrent garbage collection [Vechev et al, 2005, 2006; Vechev, 2007]. As discussed previously, the correctness of a concurrent collector depends on cooperation between the collector and the mutator in the presence of concurrency. Thus, the abstract concurrent collector logs events of mutual interest to the collector and mutator by appending to the shared list log. These events are tagged as follows:

•  Tsrc, fld, old, new〉 records that the collector has Traced pointer field fld of source object src, and that the field initially contained reference old which the collector has replaced by reference new. That is, the collector has traced an edge in the object graph srcold and replaced it with an edge srcnew.

•  Nref〉 records that the mutator has allocated a New object ref.

•  Rsrc, fld, old〉 records that the mutator has performed a Read from the heap by loading the value old from field fld of source object src.

•  Wsrc, fld, old, new〉 records that the mutator has performed a Write to the heap by storing the value new into field fld of source object src which previously contained value old. If fld is a pointer field then the mutator has replaced an edge srcold with an edge srcnew.

Each of src, fld, old, new are the addresses of the source object and source field, and old and new target object addresses, respectively. Collector event T captures the fields that have already been scanned by the mutator. For a non-moving collector tracing does not modify the references in the heap so old=new for T events. Mutator event N captures allocations by the mutator. Mutator events R, and W capture the fields that the mutator has accessed or modified.

An abstract concurrent mark-sweep collector is illustrated by Algorithm 16.4, which takes the abstract tracing collector of Algorithm 6.1 and augments it to handle the fact that the collector executes concurrently with the mutator. The algorithm proceeds in the usual way, scanning reachable objects by tracing from the roots, before sweeping to reclaim unreachable objects. Units of scanning work performed by scanTracingInc occur atomically; except to note that sweeping must also be properly synchronised, we omit details of sweepTracing.

Initialisation of the collector atomically samples the mutator roots using the routine rootsTracing and clears the log. This is performed atomically (stop-the-world) to avoid the complication of concurrent mutation of the roots by the mutator threads. On-the-fly collectors can sample the roots without stopping the world.

The collector then proceeds concurrently with the mutator, repeatedly both scanning objects and adding origins to be considered by the collector due to concurrent writes performed by the mutator.

At some point, the loop terminates as a result of some non-deterministic choice (denoted by Image), when the collector moves to a termination phase in which the remaining origins and objects to be scanned are processed atomically (that is, preventing the mutator from writing). This is performed atomically to prevent concurrent writes during termination, which may be needed to guarantee that the collector will complete its cycle. For some practical algorithms this atomic termination phase can be eliminated.

The scanTracingInc procedure implements the usual collector traversal of the heap, but incrementally, interleaved with the mutator. It differs from the original procedure scanTracing of Algorithm 6.1 only in that it atomically records to the log each traced field and the reference it contains.

The addOrigins procedure reveals that the abstract concurrent collector is parametrised by an as-yet-undefined function expose which takes a log prefix and returns a set of objects that should be considered as additional origins for live references. Different implementations for this function yield different abstract concurrent collector algorithms corresponding to concrete algorithms in the literature, as discussed further below when we describe how to instantiate specific collectors. It is the log that permits dealing with concurrent mutations that cause reachable objects to be hidden from the scan routine, which otherwise would remain unmarked.

Algorithm 16.4: Mostly-concurrent incremental tracing garbage collection

 1 shared log ← ()

 2

 3 collectTracingInc():

 4  atomic

 5    rootsTracing(W)

 6    log ← ()

 7  repeat

 8    scanTracingInc(W)

 9    addOrigins(W)

10  until Image

11  atomic

12    addOrigins(W)

13    scanTracingInc(W)

14  sweepTracing()

15

16 scanTracingInc(W):

17  while not isEmpty(W)

18    src ← remove(W)

19    if ρ(src) = 0

/* reference count is zero */

20      for each fld in Pointers(src)

21        atomic

22          ref ← *fld

23          log ← log · T⟨src, fld, ref, ref⟩

24        if ref ≠ null

25          WW + [ref]

26    ρ(src) ← ρ(src)+1

/* increment reference count */

27

28 addOrigins(W):

29  atomic

30    origins ← expose(log)

31  for each src in origins

32    WW + [src]

33

34 New():

35  ref ← allocate ()

36  atomic

37    ρ(ref) ← 0

38    log ← log · N〈ref〉

39  return ref 40

40

41 atomic Write(src, i, new):

42  if src ≠ roots

43    old ← src[i]

44    log ← log · W〈src, &src[i], old, new〉

45    src[i] ← new

The collector wavefront

Cooperation between the collector and the mutator guarantees correctness in the presence of concurrency. The log records the tracing progress of the collector through the heap — the wavefront — in the form of T events. Key to cooperation is how interleaved mutator events (N, R, and W) are treated, depending on whether they occur to the portion of the heap already scanned by the collector (behind the wavefront) or not yet scanned (ahead of the wavefront). The wavefront itself comprises the set of pending fields still to be scanned (specifically not the values of the pointers in those fields). Practical collectors may approximate the wavefront more or less precisely, from field granularity up through granularity at the level of objects to pages or other physical or logical units.

Adding origins

The addOrigins procedure uses the log to select a set of additional objects to be considered live, even if the collector has not yet encountered those objects in its trace, since it is possible that some number of reachable pointers were hidden by the mutator behind the wavefront. The precise choice of the set of origins is returned by the expose function.

Mutator barriers

The procedures New and Write represent the usual barriers performed by the mutator (here they are suitably atomic), which in the abstract algorithm coordinate with the collector by appending their actions to the log. Logging New objects allows subsequent mutator events to distinguish loading/storing fields of new objects, and loading/storing references to new objects. A freshly allocated object always has a unique reference until that reference has been stored to more than one field in the heap. Moreover, it does not contain any outgoing references (so long as its fields have not been modified, since they are initialised to null). This event allows concrete collectors to vary in how they decide liveness of objects that are allocated during the collection cycle (some collectors treat all such objects as live regardless of their reachability, leaving those that are unreachable to be reclaimed at the next collection cycle). Others will retain only those new objects whose references are stored to live objects.

As usual, the mutator Write operation assigns src[i] → new (with newnull) so the pointer to destination object new is inserted in field src[i] of source object src. Similarly, the old pointer old previously in field src[i] of source object src is deleted. When the source field is behind the collector wavefront then the pointers new/old are inserted/deleted behind the wavefront. Otherwise, the pointers are inserted/deleted ahead of the wavefront. Logging Write events captures both the inserted and deleted pointers.

Recall also that the wavefront can be expressed using the tricolour abstraction, where those objects/fields ahead of the wavefront are white, those at the wavefront are grey, and those behind the wavefront are black.

Precision

The abstract concurrent collector of Algorithm 16.4 preserves a fixed level of atomicity (as specified by the atomic blocks) while instantiating the function expose in different ways to vary precision. Varying this parameter of the abstract concurrent collector is sufficient to capture a representative subset of concrete concurrent collectors that occur in the literature, but there are other real collectors that cannot be instantiated directly from Algorithm 16.4 since they vary also in what they treat atomically. For example, Algorithm 16.4 assumes that roots can be obtained atomically from the mutator threads, which implies that they must be sampled simultaneously perhaps by stopping them all briefly (that is, Algorithm 16.4 is mostly-concurrent).

Instantiating collectors

Instantiating specific concurrent collectors within this framework requires defining a corresponding expose function. For example, consider a Steele-style concurrent collector that rescans all objects modified up to and including the wavefront. The wavefront at an object and field granularity is captured by (last) Trace operations in the log for each object/field. The objects modified are captured by the src component of all the Write records in the log, and the modified fields by the fld component. The Steele-style expose function atomically rescans modified fields that have already been traced. The traditional implementation tracks the wavefront at the object granularity (src component of Trace records) using per-object mark bits, but the abstract framework highlights that the wavefront might also operate at the field (fld) granularity given a mechanism for marking distinct fields. Thus, one need only rescan modified fields that have already been traced as opposed to whole modified objects that have already been traced. Moreover, Steele assumes that mutator thread stacks are highly volatile so expose must rescan them right to the end. Termination requires that every Trace record have no matching (at the field or object level) Write record occurring after it in the log.

A classical Dijkstra-style collector that unconditionally shades the target of any reference stored to the heap will expose the new component of all the Write records up to and including matching Trace records at the wavefront. Note that these new references are extracted directly from the log without rescanning. Termination is similar to Steele [1976].

Conversely, a Yuasa-style snapshot collector exposes the old component of all the Write records that have no matching Trace record after them in log. Tracing that stays ahead of the mutator will successfully append Trace records to the log before the mutator can modify the fields they record, offering speedier termination than for incremental update.

16.7  Issues to consider

Many of the issues facing concurrent mark-sweep garbage collection are common to all concurrent collectors. Concurrent collectors are without doubt more complex to design, implement and debug than stop-the-world collectors. Do the demands made of the collector warrant this additional complexity? Or would a simpler solution such as a generational collector suffice?

Generational collectors can offer expected pause times for most applications of only a few milliseconds. However, their worst case — a full heap collection — may pause an application for very much longer, depending on the size of the heap, the volume of live objects and so on. Such delays may not be acceptable. Concurrent collectors, on the other hand, offer shorter and more predictable pause times. As we shall see in Chapter 19, properly specified real-time collectors can guarantee sub-millisecond pause times, but this typically comes at the cost of significant overhead on both the mutator and the collector. To bound pause times, the collector must not only be concurrent but also on-the-fly: it must stop mutators only one at a time in order to process their roots.

Other questions for concurrent mark-sweep collection are the same as those for its stop-the-world counterpart. Non-moving memory managers are vulnerable to fragmentation. As well as defragmenting the heap, copying and compacting collectors permit bump pointer allocation which may be faster than free-list allocation and may also provide better locality for the mutator(s). On the other hand, mark-sweep collectors make better utilisation of space than copying collectors since they do not require a copy reserve. However, non-moving concurrent collectors have a further advantage over other concurrent collectors: a simpler heap coherency model. All concurrent collectors require mutators to inform the collector of changes to the topology of the heap in order to prevent a mutator from hiding objects from a collector. In addition, collectors that move objects must ensure both that only one collector thread moves an evacuated object and that it appears to mutators that all references to a moved object are updated atomically.

Concurrent mark-sweep collection also presents a number of tactical choices to the implementer. As with other concurrent collectors, objects may be allocated black, grey or white. Black mutators require that all objects be allocated black. Grey mutators allow further possibilities. New objects may be allocated black, grey or white, or the decision may be varied depending on the phase of the collector, the initial values of the new object’s fields, or the progress of the sweeper.

In the remaining chapters, we examine concurrent copying and compacting collectors and conclude with collectors that can provide pause time guarantees sufficient for hard real-time systems, that is, those that must meet every deadline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.170.134