An Overview of EDF and SRP

EDF (Earliest Deadline First) is a scheduling policy used in real-time systems and SRP (Stack Resource Policy) is a resource allocation policy in real-time systems. The two work well together to create a deterministic process and memory management system. The following is a brief overview of the policies. Exact implementation details of the policies and current research in these areas can be found online in the ACM Digital Library.

Introduction to EDF

Earliest deadline first (EDF) or “least time to go” is a dynamic scheduling algorithm used in real-­time operating systems to place processes in a priority queue. In this scheme, the scheduler is invoked at specific ‘decision points’. The decision points or scheduling events occur whenever a job finishes, or a new job is released and added to the ready queue. At these points, the scheduler is invoked and the ready queue will be searched for the process closest to its deadline. This process is then selected for execution and dispatched.

Intuitively, this policy of dispatching jobs that are closest to the deadline means that jobs with earlier deadlines have a higher priority than jobs with later deadlines. This can be explained in a simple, easy-to-remember phrase: “tasks that are due sooner are done sooner”.

EDF is an optimal scheduling algorithm on preemptive uniprocessors. An optimal scheduler means that if a feasible schedule exists for a certain task set, then that scheduler will be able to execute it without any deadline misses. EDF can guarantee that all deadlines are met provided that the total CPU utilization is not more than 100%. However, when the system is overloaded, some or all jobs of tasks in the task set will miss their deadlines. This set of jobs that will miss deadlines is mostly unpredictable, and it depends on the conditions of overload i.e. the exact deadlines and the time of overload.

So, a schedulability test must be done on the task set to determine if it is schedulable or not. If there is a possibility of a job missing its deadline, then the task set is reported as not 100% schedulable (depending on the conditions of overload), and EDF cannot give any guarantees about jobs adhering to their deadlines. In this case, in a hard real-time system, some of the tasks will have to be excluded from the task set in order to guarantee schedulability for the remaining tasks. In systems whose failure can cause catastrophes, it is essential to ensure that every possible task set that can execute on the system will be 100% schedulable with no deadline misses. Some degree of error can be tolerated in soft real-time systems where adhering to the deadline is a measure of performance, but missing the deadline is not particularly catastrophic.

Continue reading

Preemptible Root Scanning in GC

Concurrent garbage collection involves several operations such as read/write barrier code, stack scanning, and object copy. These must be atomic, i.e. non-preemptible, otherwise they will lead to inconsistency of data due to race conditions. The latter two involve considerable blocking times. The garbage collector must ensure a consistent root set. In incremental GC, this requires the use of barriers, to enforce the consistency of marking.

The start point of tracing is the root set, which is known to be directly accessible. It consists of references to global variables (or static in Java), references that are local to a thread, references in thread runtime stack, and thread-local CPU registers.

To make root scanning preemptible (useful in real-time systems), the premise is that atomicity during root scanning is only necessary with respect to the thread whose stack is currently being scanned, because a thread can modify only its own stack. So the system can delegate local root scans to mutator threads. This results in more efficient code, reducing the disruptiveness of garbage collection.

Continue reading

Caches and Scratchpad Memory in RTOS

Cache Replacement Policies

Predictability calculated by RELACS tool

Policy

Assoc.

2

3

4

5

6

7

8

LRU

1

1

1

1

1

1

1

FIFO

1/2

1/3

1/4

1/5

1/6

1/7

1/8

PLRU

1

0

0

RANDOM

0

0

0

0

0

0

0

LRU (Least Recently Used)

Minimal life span (mls) – minimal no. of pairwise different memory blocks needed to evict a just-visited mem block out of cache. Replacement policy with larger mls preferred.

Upper bound of mls = L. mls of LRU = L = no. of cache lines.

Non-LRU policies are considered to be much less predictable than LRU, and it would be very difficult to develop precise and efficient analyses for them. It is recommended to only use LRU caches when timing predictability is a major concern in the system design.

Continue reading

Garbage Collection in RTOSes

The requirements for a garbage collection to be real-time are-

  • Ensure that programs with bounded allocation rates do not run out of memory

  • Provide short blocking times – Specify and maintain a maximum pause time which is short, predictable, eg. 50 ms threshold. There can be many short, frequent GC pauses. Duration of each is kept to a minimum, improving performance.

Issues with incremental GC: The two sources of blocking are root scanning and heap compaction.

The Metronome Collector

The Metronome garbage collector, which is a commonly used RT-GC, limits interruptions to one millisecond and spaces them evenly throughout the application’s execution.

Continue reading

Stack Management in RTOSes

Proper size of the stack is important. It must be small enough to prevent memory wastage and large enough to accommodate real-time application requirements.

The ways to determine the size to allocate are-

  • Test the system under conditions that produce worst-case stack behavior – hard to predict/provoke.

  • Calculate the theoretical maximum stack requirement (using software tools).

Browsing for stack sizes through the Dalvik source history:

API 3 (Android 1.5) = 8KB

API 4-10 (Android 1.6 – Android 2.3.7) = 12KB

API 14-17 (Android 4.0 – Android 4.2.2) = 16KB

Garbage collection on the stack – The life span of variables on the stack is limited to the duration of the function. As soon as the function returns, the used stack memory will be freed.

Multi-Task Stack Sharing (MTSS) scheme

MTSS is a multi-task stack sharing technique, that grows the stack of a particular task into other tasks in the system after it has overflown its bounds. A possible configuration to use MTSS effectively is to deliberately reduce the memory provided to each task to below what it needs, and the deficit is recovered from stacks of other tasks. It helps to reduce the physical memory needed for an embedded system without reducing its reliability.

Continue reading

Heap Management in RTOSes

Real-time performance of the heap is not deterministic. Memory allocation time depends on such factors as previous use and the requested data space size. So, minimizing heap usage is a must in small embedded systems.

Private-Shared-Anonymous Paging (PSAP) Algorithm

The PSAP algorithm separates the physical memory space into several blocks. It is used for allocating memory pages in interactive real-time systems.

Each interactive real-time task can create a private-reserve block where pages are never stolen by other tasks. There is also a shared-reserve block where pages are shared by all interactive real-time tasks but are never stolen by best-effort tasks.

The rest of memory space, which is categorized into neither a private- nor a shared-reserve block, is called an anonymous memory block where any tasks are allowed to get pages.

Fixed-Size Blocks Allocation/Memory Pool Allocation

Uses free list of fixed-size blocks of memory, usually all same size. It works well for simple embedded systems where no large objects need to be allocated. It has a very low overhead, which results in improved performance for frequent allocation/deallocation. However, it can suffer from fragmentation.

Continue reading

Memory Allocation in Real-Time Systems

Android does not use virtual memory. Virtual memory hardware leads to increased energy usage, area cost, and design complexity. Also, the possibility of Translation Lookaside Buffer (TLB) misses is detrimental to real-time guarantees. 

Lacking virtual memory, the system will encounter a fatal error if its memory footprint exceeds the physical memory by even one byte.

Memory Requirement

Global – Size of the global segment is fixed at compile time.

Stack – Grows at run time. Estimation of maximum size s done by the compiler by measuring the stack usage of the longest path in the call graph from main() to any leaf procedure.

Stack size estimation from call graph at compile time fails in case of –

Continue reading

Shared Memory Management in Android

Shared memory

Each app process is forked from an existing process called Zygote. The Zygote process starts when the system boots and loads common framework code and resources (such as activity themes).

To start a new app process, the system forks the Zygote process then loads and runs the app’s code in the new process. This allows most of the RAM pages allocated for framework code and resources to be shared across all app processes.

Most static data is mmapped into a process. This not only allows that same data to be shared between processes but also allows it to be paged out when needed.
eg. Dalvik code (pre-linked .odex file), app resources, traditional project elements – native code in .so files.

In many places, Android shares the same dynamic RAM across processes using explicitly allocated shared memory regions (either with ashmem or gralloc).

Continue reading

Stack Memory Management in Android

Stack management

Stack architecture for threads

Dalvik had separate stacks for native and Java code, with a default Java stack size of 32KB and a default native stack size of 1MB. ART has a unified stack for better locality. Ordinarily, the ART Thread stack size should be approximately the same as for Dalvik.

A Thread is a concurrent unit of execution. It has its own call stack for methods being invoked, their arguments and local variables. Each application has at least one thread running when it is started, the main thread, in the main ThreadGroup. The runtime keeps its own threads in the system thread group.

Dalvik uses registers as primary units of data storage instead of the stack. Google is hoping to accomplish 30 percent fewer instructions as a result.

Stack size for threads

The OS allocates the stack for each system-level thread when the thread is created. When the thread exits the stack is reclaimed. The size of the stack is set when a thread is created.

Android’s default stack size is 8KB. This gets you 60-100 stack frames, depending on how complex your methods are.

Continue reading

More on Garbage Collection in Android

Garbage Collection

There appears to be a separate thread (called the HeapWorker) in each VM process that performs the garbage collection actions.

GC triggers:

  • When the Dalvik heap hits a soft occupancy limit. VM Parameter – InitiatingHeapOccupancyPercent. Default value is 45%.
  • When memory allocation for an object fails.
  • Just before triggering OutOfMemoryError – full, synchronous garbage collection is done.
  • Explicit triggering by calling System.gc()

Heap fragmentation

Android does not defragment the heap to close up space. Android can only shrink the logical heap size when there is unused space at the end of the heap. But this doesn’t mean the physical memory used by the heap can’t shrink. After garbage collection, Dalvik walks the heap and finds unused pages, then returns those pages to the kernel using madvise. (Reclaiming memory from small allocations can be much less efficient because the page used for a small allocation may still be shared with something else that has not yet been freed).

Continue reading