Chapter 7 - Basic Cache Coherence Issues (5/12)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7.2. CACHE COHERENCE IN BUS-BASED MULTIPROCESSORS 225

On a read from Processor 1, a BusRd is posted, the main memory responds with the block, and

Processor 1 gets the block, stores it in its cache with an exclusive state. On a write from Processor 1,

in contrast to MSI, the cache state transitions to modiﬁed without incurring a bus transaction, since

we know that in the exclusive state, no other cached copies exist.

When Processor 3 makes a read request, a BusRd is posted on the bus. Processor 1’s snooper

picks it up, checks its cache tag, and ﬁnds that it has the block in a modiﬁed state. This means

that Processor 1 has the latest (and the only valid) copy, and ﬂushes it in response to the snooped

request. The block state transitions to shared after that. In the mean time, the memory controller

also attempts to fetch the block from the main memory because it does not know if eventually a

cache will supply the data or not. Processor 3 snoops the ﬂush, and by matching the address of the

block being ﬂushed with its outstanding read transaction, it knows that the ﬂushed block should be

treated as the reply to its read request. So the block is picked up, stored in its cache in the shared

state. The main memory which has been trying to fetch the block from memory will also snoop

the ﬂushed block, pick it up, cancel its memory fetch, and overwrite the stale copy of the block in

memory.

Next, Processor 3 has a write request. It posts a BusUpgr on the bus to invalidate copies in other

caches. Processor 1’s coherence controller responds by invalidating its copy. Processor 3’s cache

block state transitions to modiﬁed.

When Processor 1 attempts to read the block, it suffers a cache miss as a result of the earlier

invalidation that it received. Processor 1 posts a BusRd and Processor 3 responds by ﬂushing its

cache block and transitions its state to shared. The ﬂushed block also updates the main memory

copy, so as a result the cache blocks are now clean.

When Processor 3 reads the block, it ﬁnds it in state shared in its cache. Since it has a valid

copy, it has a cache hit and does not generate a bus request.

Finally, Processor 2 attempts to read the block and posts a BusRd on the bus. Unlike MSI

that does not employ cache-to-cache transfer, in MESI both Processor 1 and Processor 3’s cache

controllers attempt to supply the block by ﬂushing it on the cache through FlushOpt. One of them

wins and supplies the block, and Processor 2 will pick up the block. The memory controller cancels

its fetch upon snooping the FlushOpt block.

While the MESI protocol has improved various performance aspects of the MSI protocol, it still

suffers from a remaining problem that is potentially quite serious. When a cache block is read and

written successively by multiple processors, each of the read incurs an intervention that requires the

owner to ﬂush the cache block. While the ﬂushed block must be picked up by the requestor as a

means to ensure write propagation, the ﬂushed block updating the copy in the main memory is not

a correctness requirement for write propagation. Unfortunately, the deﬁnition of the shared state is

that the cache block is clean, in that the value of the block is the same with that in the main memory.

Therefore, in order to preserve the meaning of the shared block, the main memory has no choice but

to update its copy. This is referred to as clean sharing in that when a block is shared by multiple

caches, it has to be clean. Note also that clean sharing implies that evicting a shared cache block can

be performed silently, i.e., the block is simply discarded. Unfortunately, by keeping clean sharing,

the main memory is updated too many times. In some systems, the bandwidth to the main memory

is already restricted, so updating the main memory on each cache ﬂush uses an excessive amount of

bandwidth. For example, if multiple cores in a multicore architecture maintain coherence at the L2

cache level, the L2 caches can communicate with each other using an on-chip interconnection, but

226 CHAPTER 7. BASIC CACHE COHERENCE ISSUES

updating the main memory must be performed by going off-chip. Off-chip bandwidth is severely

restricted in a multicore architecture because of the limited availability of pins and slow off-chip

interconnection. Thus, it will be nice if a cache ﬂush does not need to update the main memory, by

allowing dirty block to be shared by multiple caches. Supporting dirty sharing can be provided with

an additional state in the MOESI protocol which is described in the next section.

7.2.4 MOESI Protocol with Write Back Caches

As mentioned earlier, the bandwidth to the main memory can be reduced by allowing dirty sharing.

The MOESI protocol allows dirty sharing. The MESI protocol is used by Intel processors such as

the Xeon processor while the MOESI protocol is used by processors such as the AMD Opteron [4].

In the MOESI protocol, the same as the MSI protocol, processor requests to the cache include:

1. PrRd: processor-side request to read to a cache block

2. PrWr: processor-side request to write to a cache block

Bus-side requests include:

1. BusRd: snooped request that indicates there is a read request to a cache block made by

another processor.

2. BusRdX: snooped request that indicates there is a read exclusive (write) request to a cache

block made by another processor which does not already have the block.

3. BusUpgr: snooped request that indicates that there is a write request to a cache block that

another processor already has in its cache.

4. Flush: snooped request that indicates that an entire cache block is placed on the bus by a

processor to facilitate a transfer to another processor’s cache.

5. FlushOpt: snooped request that indicates that an entire cache block is posted on the bus

in order to supply it to another processor. We refer to it as FlushOpt because unlike Flush

which is needed for write propagation correctness, FlushOpt is implemented as a performance

enhancing feature that can be removed without impacting correctness.

6. FlushWB: snooped request that indicates that an entire cache block is written back to the main

memory by another processor, and it is not meant as a transfer from one cache to another.

Each cache block has an associated state which can have one of the following values:

1. Modiﬁed (M): the cache block is valid in only one cache, and the value is (likely) different

than the one in the main memory. This state has the same meaning as the dirty state in a write

back cache for a single system, except that now it also implies exclusive ownership.

2. Exclusive (E): the cache block is valid, clean, and only resides in one cache.

3. Owned (O): the cache block is valid, possibly dirty, and may reside in multiple caches. How-

ever, when there are multiple cached copies, there can only be one cache that has the block in

owned state, other caches should have the block in state shared.

7.2. CACHE COHERENCE IN BUS-BASED MULTIPROCESSORS 227

4. Shared (S): the cache block is valid, possibly dirty, but may reside in multiple caches.

5. Invalid (I): the cache block is invalid.

The idea behind the owned state is that when a cache block is shared across caches, its value

is allowed to differ from that in the main memory. One cache is assigned as the owner and caches

the block in state “O” or owned, while others cache it in the shared state. The existence of the

owner simpliﬁes how data is supplied in a cache-to-cache transfer. For example, when a BusRd is

snooped, we can let the owner to provide data through FlushOpt, while other controllers take no

action. The main memory does not need to pick up a Flush or FlushOpt to update the block in main

memory. In addition, we can also assign the owner to be responsible for writing back the block

to the main memory when the block is evicted. Hence, when a cache block in the shared state is

evicted, regardless of whether it is clean or dirty, it can be discarded. Only when the cache block

that is evicted is in the owned state, it is written back to the memory to update it. To indicate that

a block in the owned state is evicted and needs to update the main memory, a different bus request

type is needed, which we refer to as FlushWB.

Who should be the owner of a cache block? To answer this question, consider that when there

is dirty sharing, a block in the shared state can be replaced silently, but a block in the owned state

must be written back to the main memory. Bus bandwidth can be conserved if the frequency of

write backs is minimized. To reduce the frequency of write backs, the cache that will hold the

block the longest should be selected as the owner. Although predicting which cache will hold a

particular shared block the longest is difﬁcult, good heuristics can often help. Since applications

tend to exhibit temporal locality, a good heuristic for predicting such a cache is selecting the cache

that last wrote to or read from the block as the owner. However, reads to valid blocks do not

incur any bus transactions, so it is inconvenient (and expensive) to change the ownership when a

processor reads from a shared block in the cache. Thus, one heuristic that can be used (implemented

in AMD Opteron systems) is to select the last cache that wrote to the block as the owner. More

speciﬁcally, the cache that has the block in the modiﬁed state, when it receives an intervention

request, downgrades the block state to owned – in effect becoming the owner of the block.

We assume that the caches use write allocate and write invalidate cache coherence policies. The

ﬁnite state machine corresponding to the MOESI coherence protocol for write back caches is shown

in Figure 7.7. In the ﬁgure, the response to processor-side requests is shown on the top part, while

the response to the snooper-side requests is shown on the bottom part.

As before, the I state represents two cases: a case in which the block is not cached, or when

the block is cached but its state is invalid. Let us consider the top part of the ﬁgure that shows a

reaction to a processor read or write request. First, consider when the cache block is in “I” (invalid)

state. When there is a processor read request, it suffers a cache miss. To load the data into the

cache, a BusRd is posted on the bus, and the memory controller responds to the BusRd by fetching

the block from the main memory. Other snoopers will snoop the request and check their caches to

determine if any of them has a copy. If a copy is found, the cache asserts the COPIES-EXIST bus

line (indicated as “C” in the ﬁgure). In that case, the fetched block is placed in the requestor’s cache

in the shared state. If, on the other hand, the COPIES-EXIST bus line is not asserted (indicated as

the “!C” in the ﬁgure), the fetched block is placed in the requestor’s cache in the exclusive state.

When there is a processor write request, the cache must allocate a valid copy of the cache block,

and to do that, it posts a BusRdX request on the bus. Other caches will respond by invalidating their

228 CHAPTER 7. BASIC CACHE COHERENCE ISSUES

PrWr/−

PrRd/−

PrRd/BusRd(C)

PrWr/BusRdX

PrWr/−

PrRd/−

BusRdX/FlushOpt

BusRd/−

BusRdX/−

PrRd/−

PrWr/BusUpgr

BusRd/−

BusRdX/Flush

BusRd/Flush

BusRd/FlushOpt

BusUpgr/−

BusRdX/Flush

PrRd/BusRd(!C)

PrWr/BusUpgr

BusRdX/−

BusUpgr/−

Figure 7.7: State transition diagram for the MOESI coherence protocol.

cached copies, while the memory responds by supplying the requested block. When the requestor

gets the block, it is placed in the cache in the “M” or modiﬁed state.

Suppose now the cache already has the block in the exclusive state. Any read to the block is a

cache hit and proceeds without generating a bus transaction. A write to the block, as in the MESI

protocol, does not generate a bus transaction because a block in the exclusive state implies that it is

the only cached copy in the system. So the write can proceed after the state transitions to modiﬁed.

Suppose now the cache already has the block in the shared state. On a processor read, the block

is found in the cache and data is returned to the processor. This does not incur a bus transaction

since it is a cache hit, and the state remains unchanged. On the other hand, on a processor write,

there may be other cached copies that need to be invalidated, so a BusUpgr is posted on the bus, and

the state transitions to modiﬁed.

If the cache block is present in the cache in the modiﬁed state, reads or writes by the processor

do not change the state, and no bus transaction is generated since it can be sure that no other cached

7.2. CACHE COHERENCE IN BUS-BASED MULTIPROCESSORS 229

copies exist, since through an earlier invalidation, we have made sure that only one modiﬁed copy

can exist in the system.

If the cache block is present in the cache in an owned state, this means that it is dirty and

the block is shared in other caches. A processor read can just fetch the value from the cache. A

processor write must invalidate other cached copies by posting a BusUpgr transaction.

Now let us look at how the ﬁnite state machine reacts to snooped bus transactions. If the cache

does not have the block or the block is in invalid state, any snooped BusRd or BusRdX/BusUpgr

should not affect it, so they are ignored.

If the cache block is in an exclusive state, when a BusRd request is snooped, the block is ﬂushed

to the cache using FlushOpt, and the state transitions to shared. When a BusRdX request is snooped,

the block is ﬂushed using FlushOpt and the state transitions to invalid.

If the cache block is in a shared state, when a BusRd transaction is snooped, that means another

processor suffered a read miss and is trying to fetch the block. Therefore, the state remains shared,

and since only the owner is responsible for ﬂushing the block, the local cache does not ﬂush the

block. Note that there may be an owner (in case of dirty sharing) or there may not be an owner

(clean sharing). In the case that there is no owner (clean sharing), the main memory can supply

the block, although using a MESI-like FlushOpt is also possible here. If a BusRdX or BusUpgr is

snooped, the block’s state transitions to invalid. Again, the owner (if there is one) is responsible for

ﬂushing the block, so, in contrast to MESI, the non-owner caches do not need to ﬂush their block

copies.

If the cache block is in the modiﬁed state, the copy in the cache is the only valid copy in the

entire system (no other cached copies exist and the value in the main memory is stale). Therefore,

when a BusRd transaction is snooped, the block must be ﬂushed for ensuring write propagation, and

the state transitions to owned. The reason for transitioning to the owned state is based on a heuristic

that has been discussed earlier. Note that by transitioning to the owned state, the local cache has

become the supplier of the block and is responsible for ﬂushing it when required.

If the cache block is in the owned state, it indicates that there is dirty sharing, with the local

cache being responsible as the supplier of the block. Hence, when a BusRd is snooped, it ﬂushes

the block and remains in the owned state (i.e., remains the owner). If a BusRdX is snooped, it

supplies the block by ﬂushing it, and transitions into the invalid state. If a BusUpgr is snooped, it

transitions into the invalid state without ﬂushing the block. Note that because or dirty sharing, the

ﬂushes from this state is a correctness requirement, rather than a performance enhancement because

potentially, nobody else in the system has a valid copy of the block (some sharers which obtain the

block from the owner may have a valid copy, but there is no guarantee that there are sharers).

When the block in the owned state is evicted from the cache, the ownership disappears since

other caches have the copy in the shared state. Hence, at this point dirty sharing must be converted

into clean sharing. The owner is responsible for ﬂushing the block to the memory so that the memory

can update its copy. This is achieved by posting a FlushWB request on the bus. In contrast to Flush

or FlushOpt requests that are ignored by the memory controller, a FlushWB request is picked up by

the memory controller to update the value in the main memory.

Two mechanisms ensure write propagation. First, by invalidating other copies on a write to

a cache block, other caches are forced to reload the block through cache misses. Secondly, with

dirty sharing, a cache acts as an owner and ﬂushes the block when it snoops a BusRd or BusRdX,

ensuring the correct block value is passed on. With clean sharing, the memory supplies the block.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7 - Basic Cache Coherence Issues (5/12)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 7 - Basic Cache Coherence Issues (5/12)