Cache eviction strategies and pre-fetching

Of course, modern computer systems not only use least recently used (LRU) memory page eviction strategies to delete cached memory pages from the caches, they also try to predict the access pattern in order to keep the memory pages that are old but have a high probability of being requested again. In addition, modern CPUs also try to predict future memory page requests and try to pre-fetch them. Nevertheless, random memory access patterns should always be avoided and the more sequential memory access patterns are, usually the faster they are executed.

So how can we avoid random memory access patterns? Let's have a look at java.util.HashMap again. As the name suggests, the hash codes of the key objects are used in order to group contained objects into buckets. A side-effect of hashing is that even close by key values, such as subsequent integer numbers, result in different hash codes and therefore end up in different buckets. Each bucket can be seen as a pointer pointing to a linked list of key-value pairs stored in the map. These pointers point to random memory regions. Therefore, sequential scans are impossible. The following figure tries to illustrate this, as you can see pointers pointing to objects which are located at random regions in the main memory (the Java Heap to be precise):

In order to improve sequential scans, Tungsten does the following trick: the pointer not only stores the target memory address of the value, but also the key.

We have learned about this concept already, where an 8-byte memory region is used to store two integer values, for example, in this case the key and the pointer to the value. This way, one can run a sorting algorithm with sequential memory access patterns (for example, quick-sort). The following figure illustrates this layout:

This way, when sorting, the key and pointer combination memory region must be moved around but the memory region where the values are stored can be kept constant. While the values can still be randomly spread around the memory, the key and pointer combination are sequentially laid out, as the following figure tries to illustrate:

This optimization was introduced in Apache Spark V1.5 and requested in the SPARK-7082 feature request (https://issues.apache.org/jira/browse/SPARK-7082).

Table of Contents for Cache eviction strategies and pre-fetching

Create new playlist

Sign In

Sign Up

Table of Contents for
Cache eviction strategies and pre-fetching