Preemptible Root Scanning in GC

Concurrent garbage collection involves several operations such as read/write barrier code, stack scanning, and object copy. These must be atomic, i.e. non-preemptible, otherwise they will lead to inconsistency of data due to race conditions. The latter two involve considerable blocking times. The garbage collector must ensure a consistent root set. In incremental GC, this requires the use of barriers, to enforce the consistency of marking.

The start point of tracing is the root set, which is known to be directly accessible. It consists of references to global variables (or static in Java), references that are local to a thread, references in thread runtime stack, and thread-local CPU registers.

To make root scanning preemptible (useful in real-time systems), the premise is that atomicity during root scanning is only necessary with respect to the thread whose stack is currently being scanned, because a thread can modify only its own stack. So the system can delegate local root scans to mutator threads. This results in more efficient code, reducing the disruptiveness of garbage collection.

If the collector scans a thread’s stack, the thread mustn’t execute and atomicity has to be enforced, while other mutator threads can preempt the stack-scanning thread. However if the thread scans its own stack, there is no need to prevent preemption. The stack remains in the same state. This avoids preemption latencies due to root scanning and minimises the overhead. Mutator threads check an appropriate flag from time to time, then scan their local root set if the flag is set.

When the stack scan is moved to the mutator thread, each thread scans its own stack at the end of its period. Mutual exclusion is trivially enforced. There is no need to enforce atomicity. The collector initializes the GC period and sets a flag to signal all threads to scan stacks at the end of their period. When the scan has been acknowledged by all the threads, the collector scans static variables and traces the object graph.

The overhead for stack scan is low. At the end of a period, the stack is small if not empty. This also simplifies exact stack scanning, which is done at a few predefined instants. We can compute stack layout for end of period, not every point at which the thread was preempted. The required amount of data is low, and there is also a low memory overhead.

Consequences:

  • Delegating scanning of thread-local root sets to the respective mutator threads does not void the correctness of the garbage collector.
  • No reference can remain undetected. Even if a reference migrates from a not-yet-scanned local variable to a local variable already scanned, the proposed collection mechanism can compute the root set correctly.
  • Threads can exchange data only through static variables and object fields. An appropriate write barrier can therefore make migrating references visible to the garbage collector.

5 thoughts on “Preemptible Root Scanning in GC

  1. dronrathore 11 April, 2015 / 10:49 AM

    Which approach do you think is better, Pause–Clean–Play or in during the execution of a program, GC keeps a Map of all objects which has been GC flag marked and then clean them amidst execution?

    Like

    • Kajal 11 April, 2015 / 12:58 PM

      That would depend on the type of system. Personally I think pause-clean-play (“stop-the-world”) is better because of its simplicity, even though it blocks the running process for a while. Of course, in a real-time system, if an upper bound on the garbage collection time cannot be guaranteed, then the other approach of cleaning amidst execution is better.

      Like

      • dronrathore 13 April, 2015 / 10:39 AM

        Yeah true that, in a general computing system, it might fit till an end.

        In a real–time system, as often the system will be in rush of competing the time bounds and with too many context switches, there will be a lot of Objects and rush over all and giving time slot dedicatedly to GC might lead to lagging time–intervals of complete system, even if you go with a periodical one then which will be the best suited Time Scheduling algorithm in your thoughts? Should the CPU cycles be adjusted accordingly?

        Like

        • Kajal 14 April, 2015 / 9:07 AM

          I wouldn’t know about adjusting CPU cycles; that would certainly not help in an SMP system.

          A periodical GC, if implemented well, would not cause the problems of time lag that you are referring to. For example, the Metronome collector ( https://backendtech.wordpress.com/2015/01/01/garbage-collection-in-rtoses/) limits interruptions to one millisecond and spaces them evenly throughout the application’s execution.

          Context switch would be required in any case on a uniprocessor system.
          For a multiprocessor system, there is also an option of running the GC concurrently. However that can cause many other problems like maintaining the consistency of the marked set during mark-and-sweep (applications continuously modify their heaps) and it is often easier and better to use an incremental, time-bound, periodic garbage collector.

          Like

          • dronrathore 24 April, 2015 / 11:00 PM

            Nicely explained bits! Thanks 🙂

            Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s