Accessing Memory

As we saw earlier, manipulating larger types can be more costly because of the higher number of instructions involved. Intuitively, more instructions often result in lower performance simply because of the extra work the CPU has to perform to execute them. In addition to that, code and data both reside in memory, and accessing memory itself has a cost.

Because accessing memory is a costly operation, a CPU caches the memory that was recently accessed, whether it was memory that was read from or memory that was written to. In fact, a CPU typically uses two caches organized in a hierarchy:

  • Level 1 cache (L1)
  • Level 2 cache (L2)

The L1 cache is the faster but also the smaller of the two. For example, the L1 cache could be 64 kilobytes (32 kilobytes for data cache, and 32 kilobytes for instruction cache) whereas an L2 cache could be 512 kilobytes.

NOTE: Some processors may also have a Level 3 cache, typically several megabytes in size, but you won't find that on embedded devices yet.

When data or instructions cannot be found in a cache, a cache miss occurs. This is when data or instructions need to be fetched from main memory. There are several kinds of cache misses:

  • Read miss in instruction cache
  • Read miss in data cache
  • Write miss

The first type of cache miss is the most critical, as the CPU has to wait until the instruction is read from memory before it can be executed. The second type of cache miss can be as critical as the first type, although the CPU may still be able to execute other instructions that do not depend on the data being fetched. This effectively results in an out-of-order execution of the instructions. The last type of cache miss is much less critical, as the CPU can typically continue executing instructions. You will have little control over write misses, but you should not worry about it much. Your focus should be on the first two types, which are the kinds of cache misses you want to avoid.

The Cache's Line Size

Besides its total size, another important property of a cache is its line size. Each entry in the cache is a line, which contains several bytes. For example, a cache line on a Cortex A8 L1 cache is 64 bytes (16 words). The idea behind the cache and cache line is the principle of locality: if your application reads from or writes to a certain address, it is likely to read from or write to the same address, or a close-enough address in the near future. For example, this behavior was obvious in the implementation of the findMin() and addAll() methods in Listing 4–9.

There is no easy way for your application to know the size of a cache and the size of a cache line. However, knowing the caches exist and having some knowledge about how caches work can help you write better code and achieve better performance. The following tips can help you take advantage of the cache without having to recourse to low-level optimization, as shown in Chapter 3 with the PLD and PLI assembly instructions. To reduce the number of cache read misses from the instruction cache:

  • Compile your native libraries in Thumb mode. There is no guarantee this will make your code faster though as Thumb code can be slower than ARM code (because more instructions may have to be executed). Refer to Chapter 2 for more information on how to compile libraries in Thumb mode.
  • Keep your code relatively dense. While there is no guarantee dense Java code will ultimately result in dense native code, this is still quite often a true assumption.

To reduce the number of cache read misses from the data cache:

  • Again, use the smallest type possible when storing a large amount of data in arrays.
  • Choose sequential access over random access. This maximizes the reuse of data already in the cache, and can prevent data from being removed from the cache only to be loaded in the cache again later.

NOTE: Modern CPUs are capable of prefetching memory automatically to avoid, or at least limit, cache misses.

As usual, apply these tips on performance-critical sections of your application, which usually is only a small part of your code. On the one hand, compiling in Thumb mode is an easy optimization that does not really increase your maintenance effort. On the other hand, writing dense code may make things more complicated in the long run. There is no one-size-fits-all optimization, and you will have the responsibility of balancing the multiple options you have.

While you don't necessarily have control over what goes into the cache, how you structure and use your data can have an impact on what ends up being in the cache, and therefore can impact performance. In some cases, you may be able to arrange your data in a specific manner to maximize cache hits, albeit possibly creating greater complexity and maintenance cost.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.187.101