9.6. INTERACTIONS WITH GLOBALLY SHARED MEMORY

Traditional systems have two notions of system or subsystem cache coherence. The first, non-coherent, means that memory accesses have no effect on the caches in the system. The memory controller reads and writes memory directly, and any cached data becomes incoherent in the system. This behavior requires that all cache coherence with I/O be managed by software mechanisms, as illustrated in Figure 9.8. In this example the processors and potentially cached contents of local memory are unaware of the request and response transactions to local memory. Software mechanisms must be used to signal changes to local memory so that the caches can be appropriately updated.

The second notion of system cache coherence is that of global coherence. In this scenario, an I/O access to memory will cause a snoop cycle to be issued on the processor bus, keeping all of the system caches coherent with the memory as illustrated in Figure 9.9. Owing to the snoop transaction running on the local interconnect the cache memories are given visibility to the change in the memory contents and may either update their caches with the proper contents or invalidate their copy of the data. Either approach results in correct coherent memory operations.

The example in Figure 9.9 works for systems with traditional bus structures. In RapidIO-based systems there is no common bus that can be used to issue the snoop transaction to. In this type of system global coherence requires special hardware support that goes beyond simply snooping the bus. This leads to a third notion of cache coherence, termed local coherence. For local coherence, a snoop transaction on a processor bus local to the targeted memory controller can be used to keep those caches coherent with that part of memory, but would not affect caches associated with other memory controllers, as illustrated in Figure 9.10. What once was regarded in a system as a 'coherent access' is no longer globally coherent, but only locally coherent. Typically, deciding to snoop or not snoop the local processor caches is either determined by design or system architecture policy (always snoop or never snoop), or by an attribute associated with the physical address being accessed. In PCI-X, this attribute is the No Snoop (NS) bit described in Section 2.5 of the PCI-X 1.0 specification.

Figure 9.8. Traditional non-coherent I/O access example

Figure 9.9. Traditional globally coherent I/O access example

Figure 9.10. RapidIO locally coherent I/O access example

In order to preserve the concept of global cache coherence for this type of system, the RapidIO Globally Shared Memory Logical Specification defines several operations that allow a RapidIO to PCI bridge processing element to access data in the globally shared space without having to implement all of the cache coherence protocol. These operations are the I/O Read and Data Cache Flush operations. For PCI-X bridging, these operations can also be used as a way to encode the No Snoop attribute for locally as well as globally coherent transactions. The targeted memory controller can be designed to understand the required behavior of such a transaction. These encodings are also useful for tunneling PCI-X transactions between PCI-X bridge devices.

The data payload for an I/O Read operation is defined as the size of the coherence granule for the targeted globally shared memory domain. However, the Data Cache Flush operation allows coherence granule, sub-coherence granule, and sub-double-word writes to be performed.

The IO_READ_HOME transaction is used to indicate to the GSM memory controller that the memory access is globally coherent, so the memory controller finds the latest copy of the requested data within the coherence domain (the requesting RapidIO to PCI bridge processing element is, by definition, not in the coherence domain) without changing the state of the participant caches. Therefore, the I/O Read operation allows the RapidIO to PCI bridge to cleanly extract data from a coherent portion of the system with minimal disruption and without having to be a full participant in the coherence domain.

The Data Cache Flush operation has several uses in a coherent part of a system. One such use is to allow a RapidIO to PCI bridge processing element to write to globally shared portions of the system memory. Analogous to the IO_READ_HOME transaction, the FLUSH transaction is used to indicate to the GSM memory controller that the access is globally coherent. The memory controller forces all of the caches in the coherence domain to invalidate the coherence granule if they have a shared copy (or return the data to memory if one had ownership of the data), and then writes memory with the data supplied with the FLUSH request. This behavior allows the I/O device to cleanly write data to the globally shared address space without having to be a full participant in the coherence domain.

Since the RapidIO to PCI bridge processing element is not part of the coherence domain, it is never the target of a coherent operation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.189.109