Chapter 10 - Advanced Cache Coherence Issues (3/8)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10.3. BASIC DIRECTORY CACHE COHERENCE PROTOCOL 341

to all current sharers. The data reply message sent to the requestor also includes the number of

invalidation acknowledgments the requestor can expect to receive. The directory state transitions to

EM. If it is an upgrade request, however, the directory sends invalidations to sharers, and a reply

without data to the requestor.

Finally, when the directory state is EM, if a read request is received, then the new state will be

shared, and an intervention request is sent to the owner of the block. The intervention request also

contains the ID of the requestor so that the owner knows where to send the block. If a read exclusive

request is received, then a new processor wants to write to the block, so the directory needs to send

an invalidation to the current owner. The invalidation request also contains the ID of the requestor

so that the owner knows where to send the block. Note that the block may be in either exclusive

or modiﬁed state at the owner; the directory does not know which is the case. In either case, upon

receiving intervention or invalidation, the owner must respond appropriately.

To illustrate the operation of the protocol, consider an example of a stream of memory accesses

to a single memory block shown in Table 10.1, assuming a three-node multiprocessor in which each

node has a processor, a cache, and a memory with a directory of all local memory blocks. We

assume the directory format consists of the directory state and a full-bit sharing vector. The table

shows requests made by various processors in the following format: Rx or Wx, where R stands for

a read request, W stands for a write request, and x stands for the ID of the processor making the

request. Messages generated as a result of the processor request are also shown, in the following

format: Msgtype(Src → Dest).

Note that in this illustration, we assume that all messages that correspond to each request occur

atomically, i.e., messages from different requests are not overlapped. Overlapping of messages from

different requests can introduce potential correctness problems and must be handled correctly. They

will be discussed in later sections. For now, we assume that requests are processed one at a time.

The example does not make any assumption on where the home of the block resides. Each time

a message is sent from a processor to another processor or the home node, it is counted as one

hop and is shown in one line. A message that cannot be sent before a prior message is received

indicates message dependence, and they must be sent in two different hops. Messages that do not

have dependence can be sent in parallel, denoted with the symbol “//” in the table.

Initially, the block is uncached, and hence the directory content indicates the uncached (U) state,

and the full bit sharing vector has 0 in all bits. When processor P1 wants to read the block, it sends

a read request to the home node (ﬁrst hop). The home node checks the directory and ﬁnds that

the block is uncached, hence it has the latest value of the block. It then sends a ReplyD message

containing the data value of the block to P1. It also changes the state of the block to EM, with the

sharing vector bit for P1 set to “1” (the sharing full-bit vector becomes “100”).

Next, P1 wants to write to the block that it has just brought into its cache. It checks the state of

the block, and since the block is in the exclusive state, it changes the state to modiﬁed and writes to

the block silently, i.e., without sending any messages. The directory is unaware of this action, but it

does not need to, because the directory state of the block is already EM.

When P3 wants to read from the block, it suffers a cache miss, and this causes a Read message

to be sent to the home node. The home node checks its directory and ﬁnds that the state is EM and

the block is exclusively cached by P1. Since it does not know whether the block is clean or dirty

in P1, it assumes that the block is dirty and sends a intervention message to it. In the message, it

includes the ID of P3 so that later P1 can send the data directly to P3. Home then includes P3 as a

342 CHAPTER 10. ADVANCED CACHE COHERENCE ISSUES

Table 10.1: Illustration of directory-based coherence protocol operations.

Request P1 P2 P3 Directory Messages #Hops

Initially – – – U, 000 –

R1 E – – EM, 100 Read(P1→H) 2

ReplyD(H→P1)

W1 M – – EM, 100 – 0

R3 S – S S, 101 Read(P3→H) 3

Int(H→P1)

Flush(P1→H // P1→P3)

W3 I – M EM, 001 Upgr(P3→H) 3

Reply(H→P3) // Inv(H→P1)

InvAck(P1→P3)

R1 S – S S, 101 Read(P1→H) 3

Int(H→P3)

Flush(P3→P1 // P3→H)

R3 S – S S, 101 – 0

R2 S S S S, 111 Read(P2→H) 2

ReplyD(H→P2)

sharer by updating its sharing vector to “101”. When P1 receives the intervention request, it sends

its data block to P3 using the ﬂush message. It downgrades its block state to shared. In MESI, the

shared state implies a clean block, thus it also needs to send a separate ﬂush message to the home

node containing the data value of the block. The home node uses the ﬂushed block to update the

block in the local memory. This transaction proceeds in three hops: from P3 to home, home to P1,

and P1 to home and P3.

After reading the block, P3 wants to write to it. It ﬁnds the block in the shared state in its cache,

so it does not have permission to write to the block. Hence, it sends an upgrade request using the

Upgr message to the home node. The home node ﬁnds that currently P1 also caches the block and

must be invalidated. Thus, it sends an Inv message to P1. In parallel, it sends a reply to P3 to notify

it to expect an invalidation acknowledgment from P1. Home also changes the directory state to

EM, and removes P1 from its sharing vector, so the new sharing bit vector value is “001”. When

P1 receives the invalidation message from home, it invalidates its block and sends an invalidation

acknowledgment message (InvAck) to P3. When P3 receives the InvAck from P1, it knows that it

can now change its block state to modiﬁed and write to the block.

Later, P1 wants to read from the block that was just invalidated in the previous event. P1 suffers

a cache miss as it ﬁnds the block in state invalid in its cache, so it sends out a Read message to

home. Home ﬁnds the block is in state EM and the sharing vector indicates that P3 has the block.

So it sends an intervention message to P3, and P3 responds by downgrading its cache state to shared

and ﬂushes the block to both P1 and home. Home changes the directory state to shared, updates the

sharing vector to “101”, and uses the ﬂushed block to update the block value in the local memory.

When P3 wants to read from the block, it ﬁnds it in its cache in a shared state. This results in a

cache hit, and no messages need to be sent.

10.4. IMPLEMENTATION CORRECTNESS AND PERFORMANCE 343

Finally, P2 wants to read the block. It sends a read request to home. The directory state at home

is shared, indicating that the block is clean. Hence, the home sends a ReplyD message containing

the block that it fetches from its local memory.

Let us now compare the basic directory coherence protocol versus a broadcast coherence pro-

tocol. First, it is clearly more complicated, with more message types than in a bus-based multipro-

cessor, the maintenance of a directory at home nodes, and possible protocol races when multiple

messages from different requests are overlapped. In addition, in a bus-based multiprocessor, all

transactions only incur at most two hops (the request and subsequent reply on the bus), while direc-

tory coherence sometimes requires three-hop transactions. In addition, cache-to-cache transfer that

can be employed in a bus-based multiprocessor cannot be applied with directory-based coherence.

Since all read or write misses need to go to home, the fastest way to supply data is when the home

has them in the local memory. In this case, no cache to cache transfer can be exploited.

10.4 Implementation Correctness and Performance

For our simple protocol we discussed in the previous section, we have assumed that (1) the directory

state and its sharing vector reﬂects the most up to date state in which the block is cached, and (2)

messages due to a request are processed atomically (without being overlapped with one another). In

real systems, both assumptions do not necessarily apply. This results in various protocol races that

need to be handled properly. Some of them can be handled relatively simply by the directory alone.

But in other cases, the directory alone cannot handle them correctly. The behavior of the cache

coherence controllers at each node must also be modiﬁed to aid the directory. We will now discuss

how various races can be handled. Note that it is not the purpose of this chapter to exhaustively

discuss all possible races. We will mostly focus on major races to give readers a ﬂavor of the

complexity in handling them.

10.4.1 Handling Races Due to Out-of-Sync Directory State

The reason why the directory state and cache states can get out of sync is because they are not

updated in lock step, and some events in the caches are never seen by the directory state. For

example, a block may be replaced from a cache, and the corresponding node is no longer a sharer.

However, the directory does not see this event and still thinks that the node is a sharer of the block.

Several odd situations can occur due to the directory having an inconsistent view of the cache states.

Consider when the directory thinks that a node is a sharer of a block, caching the block in a

clean shared state. Let us call the node A. However, the block is replaced by the node A from its

cache silently. When the directory receives a read exclusive request to the block from another node,

it sends invalidation messages to all nodes that the directory thinks as sharers of the block, including

node A. This results in a situation in which a node receives an invalidation message to a block that

it no longer has in its cache. In this case, the correct response is to acknowledge the receipt of the

invalidation by sending an invalidation acknowledgment message to the requester. No harm is done

here since the intended ﬁnal state for the block is invalid, and the block is already uncached (which

is similar to being invalidated).

Another situation is when the directory thinks a node is already a sharer of a block, but the

directory receives a read request from the node. This situation occurs because the node that kept the

block in its cache has silently evicted the block, while the directory still thinks the node is a sharer.

344 CHAPTER 10. ADVANCED CACHE COHERENCE ISSUES

When the node wants to read from the block, it suffers a read miss, and as a result, a read request

is sent to the directory. Handling this case is also simple. The directory can reply with data to the

requestor and keep the sharing bit vector unchanged, since the bit vector already reﬂects the node

as a sharer.

When the directory indicates that a block is exclusively cached (the directory state is EM),

similar odd situations can occur. For example, a Read or ReadX request may arrive at the directory

from a node that the directory thinks as caching the block exclusively (in exclusive or modiﬁed

state). In this case, apparently the node that owns the block has evicted the block. If the block was

clean, no write back (or ﬂush) had occurred, whereas if the block was dirty, the ﬂush message has

yet to reach the directory. The directory cannot just reply with data since it may not have the latest

data. In contrast to the case in which the directory state is shared, with an EM state, the directory

cannot reply with data just yet. However, it cannot just wait for the ﬂushed block to arrive either

because it may never come (the block might be clean and was evicted silently by the requestor while

it was in the exclusive state). Therefore, this is not a problem that can be solved just by the protocol

at the directory alone. The coherence controller at each processor node must also participate. In

particular, when the processor evicts a dirty block and ﬂushes the block to the main memory at

the home node, it must keep track of whether the write back has been completed or not, with a

structure called the outstanding transaction buffer. The home node must send an acknowledgment

upon the receipt of a ﬂush message. In addition, the coherence controller at the processor side must

also delay Read or ReadX requests to a block that is being ﬂushed, until it gets an acknowledgment

from the directory that the ﬂushed block has been received. This way, the directory will never

see a Read/ReadX request to a block from a node that is still ﬂushing it. Thus, when it receives

a Read/ReadX request, the directory knows that it must be the case that the block was clean and

evicted silently. Therefore, it can safely reply to the requestor with the data value of the block.

The ﬁnite state machine corresponding to the coherence protocol employed at the directory is

shown in Figure 10.4. The left hand side of the slash symbol (“/”) is the request received from

the processor by the directory, and the sender of the message (the requesting node) is shown in the

brackets. The right hand side of the slash is the messages sent by the directory as a response to

the request received from the processor. The destination of the message is shown in the brackets.

Only state transitions that are new compared to Figure 10.3 are labeled. The coherence action in the

directory that manipulates the sharing bit vector is not shown in the ﬁgure.

Note that because the directory may be out-of-sync with the cache states, there may be requests

arriving to the directory from the owner (i.e., the node that the directory thinks as caching a block

in exclusive or modiﬁed state). In order to distinguish requests from the current owner versus from

other processors, the EM state is split into two: EM

which indicates that the directory thinks that

the current owner is node A, and EM

which indicates that the directory thinks that the current

owner is node B. Nodes A and B can be any nodes as long as they are distinct from one another.

When the current state is EM

, how the protocol reacts depends on whether the request comes

from the current owner (A) or from other nodes (B). Likewise, when the current state is EM

how the protocol reacts depends on whether the request comes from the current owner (B) or from

other nodes (A). Since the EM state is now split into two, there are now two additional edges that

represent transitions between the two EM states.

Since the processing of requests from a current sharer or from non-sharer nodes is the same, the

shared state is not split into two.

10.4. IMPLEMENTATION CORRECTNESS AND PERFORMANCE 345

EMB

EMA

ReadX[A]/Reply[A],Inv[B]

ReadX[A]/ReplyD[A]

ReadX[B]/Reply[B],Inv[A]

Read[A]/ReplyD[A]

ReadX[B]/ReplyD[B]

Read[B]/ReplyD[B]

Read/ReplyD

Figure 10.4: Simpliﬁed directory state ﬁnite state coherence protocol diagram, with modiﬁ-

cations to handle out-of-sync view of cache states at the directory.

In the shared (S) state, now it is possible to receive a read request from the node that the directory

thinks a sharer. Since the directory already recognizes the node as a sharer, it is correct to respond

with the data block through a ReplyD message.

Suppose the directory state is EM

(identical responses are produced when the state is EM

If a read request coming from a different node (B) is received, then the new state will be shared,

and an intervention request is sent to the owner of the block (A). The intervention request also

contains the ID of the requestor so that the owner knows where to send the block. If a read exclusive

request from another node (B) is received, then the directory sends an invalidation message to the

current owner (A), sends a reply message to B telling it to expect to receive a reply from A, and

ownership transfer is indicated by transitioning from state EM

to EM

. The invalidation request

also contains the ID of the requestor so that the owner knows where to send the block to. Note that

the block may be in state either exclusive or modiﬁed at the owner; the directory does not know. In

either case, upon receiving intervention or invalidation, the owner must respond appropriately.

Let us now discuss the new case that occurs due to the directory having an out-of-sync view of

the cache states. When the state is EM

, with an out-of-sync view of cache states, the directory

may receive a read or read exclusive request from node A. This is because node A suffers a read or

write miss after the clean block was silently evicted from its cache (if it had been dirty, the request

would have been stalled at the requestor and instead the directory would receive a ﬂush message

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10 - Advanced Cache Coherence Issues (3/8)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 10 - Advanced Cache Coherence Issues (3/8)