8.7. The Diamond 212GP Controller’s Cache Interfaces

Most RISC processors including the Diamond 212GP core use cache memories—fast, small RAM arrays—to buffer the processor from the slower and larger main memories generally located external to the processor core or the SOC. The Diamond 212GP core’s caches store the data and instructions that a program is immediately using, while the majority of other data resides in slower main memory (RAM or ROM). In general, the Diamond 212GP core accesses instruction and data caches simultaneously, which maximizes processor bandwidth and efficiency.

The Diamond 212GP controller incorporates a pre-configured version of the Xtensa cache controller that operates separate, 2-way set-associative, 8-Kbyte instruction and data caches. The data cache employs a default write-back (as opposed to write-through) write policy although this policy can be changed under program control through the RPU access modes discussed in Section 8.5. The Diamond 212GP processor’s cache memories accelerate every program the processor executes by storing local copies of executed code sequences and data in fast cache memory. The caches provide faster access to instructions and data than can memories attached to the processor’s PIF.

The cache-control logic for the Diamond 212GP’s instruction and data caches is woven (integrated) into the processor’s core so the Diamond 212GP processor has direct interface ports for the eight RAM arrays needed to implement the two, 2-way set-associative, 8-Kbyte instruction and data caches. Figure 8.5 shows how the eight RAM arrays attach to the processor core’s cache interface ports.

Figure 8.5. The Diamond 212GP controller core incorporates a cache controller that operates separate, 2-way set-associative, 8-Kbyte instruction and data caches. The processor has direct interfaces for the eight RAM arrays (used for storing cached data and cache tags) needed to implement the two caches.


Four of the cache-RAM arrays store data and the other four RAM arrays store the cache tags, which are used to hold critical information about the instructions and data stored in the cache. Each cache way for the instruction and data caches requires separate RAM arrays for cache data and cache tags so eight RAM arrays are needed to complete the Diamond 212GP controller core’s cache memory. Although the cache controller is integrated into the logic of the Diamond 212GP core, the cache’s RAM arrays are external to the core.

The Diamond 212GP processor accesses information directly from the cache memories when an instruction fetch or a load hits in the cache. A hit occurs when there is a match between the cache’s address tags (stored in the cache’s tag RAM arrays) and the target address of the transaction (instruction fetch, load, or store). When there is a cache hit, the processor fetches the required instruction or data directly from the associated instruction or data cache and the operation continues. This sort of memory transaction is as fast as fetching instructions from local instruction memory or loading data from local data memory.

A cache miss occurs when the cache’s address tags do not match the address of the transaction. In the case of a cache miss, the desired instruction or data is not present in the cache and the integral cache controller retrieves a cache line worth of instructions or data from non-local memory (directly attached to the PIF or located off chip) to load the missing information into the cache. The requested information must be retrieved from the non-local memory before processor operation can continue.

The Diamond 212GP processor’s cache interface is separate from but intimately related to the PIF. The first load access of a given instruction or data word will cause a cache miss (because the information is not yet stored in the cache) and will therefore generate a read cycle on the PIF directed at main memory. However, when the desired instruction or data arrives, the cache controller saves a copy of the retrieved information in the fast cache memory so that the cache will actually supply the instructions or data on subsequent accesses to the same location. This mechanism speeds system access and produces better performance than if the program had to run in main memory alone.

Because loads from and stores to the data cache compete for cache access, store transactions pass through a store buffer within the processor so that loads and stores can overlap. The processor’s store buffer is a shared queue of write operations targeted at the data-cache, data-RAM, and XLMI ports. Store operations complete when the cache is not otherwise occupied with load operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.29.112