Analysis of I/O Behaviour of Apple Desktop Applications

This is a short summary or overview I wrote after reading a conference paper from the ACM Digital Library. The original paper can be found here: Click here

Analysis of I/O Behaviour of Apple Desktop Applications

Experimental Setup

The iBench task suite was created consisting of 6 applications running a total of 34 different tasks. The applications were grouped as iWork (Pages, Numbers, Keynote) and iLife (iPhoto, iTunes, iMovie). The I/O behaviour of these applications was analyzed.

The tasks in the iBench suite were:

  • iLife iPhoto: start, import, duplicate, edit, view, delete. The application worked on 400 2.5 MB photos imported as 12 MP pictures from a 1 GB flash card.

  • iLife iTunes: start, import and play mp3 album containing 10 songs, import and play a 3 minute long MPEG-4 movie.

  • iLife iMovie: start, import, add clip to project, export to 3 minute MPEG-4 movie.

  • iWork Pages: start, create, save, open, export 15 page documents with and without images in different formats.

  • iWork Numbers: start, generate and save, open, export a 5 page spreadsheet.

  • iWork Keynote: start, create slides with and without images, open, play, export 20 slides.

For the purpose of a particular case study, the Pages application, a word processor, was used to create a blank document, insert 15 JPEG images each of size 2.5 MB, and save as .doc.

Continue reading

Preemptible Root Scanning in GC

Concurrent garbage collection involves several operations such as read/write barrier code, stack scanning, and object copy. These must be atomic, i.e. non-preemptible, otherwise they will lead to inconsistency of data due to race conditions. The latter two involve considerable blocking times. The garbage collector must ensure a consistent root set. In incremental GC, this requires the use of barriers, to enforce the consistency of marking.

The start point of tracing is the root set, which is known to be directly accessible. It consists of references to global variables (or static in Java), references that are local to a thread, references in thread runtime stack, and thread-local CPU registers.

To make root scanning preemptible (useful in real-time systems), the premise is that atomicity during root scanning is only necessary with respect to the thread whose stack is currently being scanned, because a thread can modify only its own stack. So the system can delegate local root scans to mutator threads. This results in more efficient code, reducing the disruptiveness of garbage collection.

Continue reading

RAM Cleaners in Android

Using RAM management apps in Android which clear the RAM of all applications is, in fact, detrimental to the system and may degrade the performance and battery life instead of conserving battery power.

Conventional belief-
If an app is stored in the RAM, it consumes battery power.
The truth is- When an app is stored in RAM, the bits in the RAM are held as 1s or 0s accordingly, and the memory is cleared whenever the app is removed. Holding a bit as 0 or 1 takes the same power. There is no effect on the power usage if a bit is held as 1 instead of 0, as this involves only data storage and the kind of bit being stored in that memory location does not matter.

It is the CPU usage that affects the battery life, and not the RAM usage. If an app is present in memory and is consuming a lot of battery power, it indicates a high usage of the processor by the app, possibly as a result of weak code management by the developer of the app. Killing or uninstalling such an app may help to increase battery life, but the cause for this is not RAM usage, rather it is the excessive usage of the CPU.

Continue reading

Caches and Scratchpad Memory in RTOS

Cache Replacement Policies

Predictability calculated by RELACS tool

Policy

Assoc.

2

3

4

5

6

7

8

LRU

1

1

1

1

1

1

1

FIFO

1/2

1/3

1/4

1/5

1/6

1/7

1/8

PLRU

1

0

0

RANDOM

0

0

0

0

0

0

0

LRU (Least Recently Used)

Minimal life span (mls) – minimal no. of pairwise different memory blocks needed to evict a just-visited mem block out of cache. Replacement policy with larger mls preferred.

Upper bound of mls = L. mls of LRU = L = no. of cache lines.

Non-LRU policies are considered to be much less predictable than LRU, and it would be very difficult to develop precise and efficient analyses for them. It is recommended to only use LRU caches when timing predictability is a major concern in the system design.

Continue reading

Garbage Collection in RTOSes

The requirements for a garbage collection to be real-time are-

  • Ensure that programs with bounded allocation rates do not run out of memory

  • Provide short blocking times – Specify and maintain a maximum pause time which is short, predictable, eg. 50 ms threshold. There can be many short, frequent GC pauses. Duration of each is kept to a minimum, improving performance.

Issues with incremental GC: The two sources of blocking are root scanning and heap compaction.

The Metronome Collector

The Metronome garbage collector, which is a commonly used RT-GC, limits interruptions to one millisecond and spaces them evenly throughout the application’s execution.

Continue reading

Stack Management in RTOSes

Proper size of the stack is important. It must be small enough to prevent memory wastage and large enough to accommodate real-time application requirements.

The ways to determine the size to allocate are-

  • Test the system under conditions that produce worst-case stack behavior – hard to predict/provoke.

  • Calculate the theoretical maximum stack requirement (using software tools).

Browsing for stack sizes through the Dalvik source history:

API 3 (Android 1.5) = 8KB

API 4-10 (Android 1.6 – Android 2.3.7) = 12KB

API 14-17 (Android 4.0 – Android 4.2.2) = 16KB

Garbage collection on the stack – The life span of variables on the stack is limited to the duration of the function. As soon as the function returns, the used stack memory will be freed.

Multi-Task Stack Sharing (MTSS) scheme

MTSS is a multi-task stack sharing technique, that grows the stack of a particular task into other tasks in the system after it has overflown its bounds. A possible configuration to use MTSS effectively is to deliberately reduce the memory provided to each task to below what it needs, and the deficit is recovered from stacks of other tasks. It helps to reduce the physical memory needed for an embedded system without reducing its reliability.

Continue reading

Heap Management in RTOSes

Real-time performance of the heap is not deterministic. Memory allocation time depends on such factors as previous use and the requested data space size. So, minimizing heap usage is a must in small embedded systems.

Private-Shared-Anonymous Paging (PSAP) Algorithm

The PSAP algorithm separates the physical memory space into several blocks. It is used for allocating memory pages in interactive real-time systems.

Each interactive real-time task can create a private-reserve block where pages are never stolen by other tasks. There is also a shared-reserve block where pages are shared by all interactive real-time tasks but are never stolen by best-effort tasks.

The rest of memory space, which is categorized into neither a private- nor a shared-reserve block, is called an anonymous memory block where any tasks are allowed to get pages.

Fixed-Size Blocks Allocation/Memory Pool Allocation

Uses free list of fixed-size blocks of memory, usually all same size. It works well for simple embedded systems where no large objects need to be allocated. It has a very low overhead, which results in improved performance for frequent allocation/deallocation. However, it can suffer from fragmentation.

Continue reading