Concurrent garbage collection involves several operations such as read/write barrier code, stack scanning, and object copy. These must be atomic, i.e. non-preemptible, otherwise they will lead to inconsistency of data due to race conditions. The latter two involve considerable blocking times. The garbage collector must ensure a consistent root set. In incremental GC, this requires the use of barriers, to enforce the consistency of marking.
The start point of tracing is the root set, which is known to be directly accessible. It consists of references to global variables (or static in Java), references that are local to a thread, references in thread runtime stack, and thread-local CPU registers.
To make root scanning preemptible (useful in real-time systems), the premise is that atomicity during root scanning is only necessary with respect to the thread whose stack is currently being scanned, because a thread can modify only its own stack. So the system can delegate local root scans to mutator threads. This results in more efficient code, reducing the disruptiveness of garbage collection.
If the collector scans a thread’s stack, the thread mustn’t execute and atomicity has to be enforced, while other mutator threads can preempt the stack-scanning thread. However if the thread scans its own stack, there is no need to prevent preemption. The stack remains in the same state. This avoids preemption latencies due to root scanning and minimises the overhead. Mutator threads check an appropriate flag from time to time, then scan their local root set if the flag is set.
When the stack scan is moved to the mutator thread, each thread scans its own stack at the end of its period. Mutual exclusion is trivially enforced. There is no need to enforce atomicity. The collector initializes the GC period and sets a flag to signal all threads to scan stacks at the end of their period. When the scan has been acknowledged by all the threads, the collector scans static variables and traces the object graph.
The overhead for stack scan is low. At the end of a period, the stack is small if not empty. This also simplifies exact stack scanning, which is done at a few predefined instants. We can compute stack layout for end of period, not every point at which the thread was preempted. The required amount of data is low, and there is also a low memory overhead.
- Delegating scanning of thread-local root sets to the respective mutator threads does not void the correctness of the garbage collector.
- No reference can remain undetected. Even if a reference migrates from a not-yet-scanned local variable to a local variable already scanned, the proposed collection mechanism can compute the root set correctly.
- Threads can exchange data only through static variables and object fields. An appropriate write barrier can therefore make migrating references visible to the garbage collector.