i
i
i
i
i
i
i
i
10.3. BASIC DIRECTORY CACHE COHERENCE PROTOCOL 341
to all current sharers. The data reply message sent to the requestor also includes the number of
invalidation acknowledgments the requestor can expect to receive. The directory state transitions to
EM. If it is an upgrade request, however, the directory sends invalidations to sharers, and a reply
without data to the requestor.
Finally, when the directory state is EM, if a read request is received, then the new state will be
shared, and an intervention request is sent to the owner of the block. The intervention request also
contains the ID of the requestor so that the owner knows where to send the block. If a read exclusive
request is received, then a new processor wants to write to the block, so the directory needs to send
an invalidation to the current owner. The invalidation request also contains the ID of the requestor
so that the owner knows where to send the block. Note that the block may be in either exclusive
or modified state at the owner; the directory does not know which is the case. In either case, upon
receiving intervention or invalidation, the owner must respond appropriately.
To illustrate the operation of the protocol, consider an example of a stream of memory accesses
to a single memory block shown in Table 10.1, assuming a three-node multiprocessor in which each
node has a processor, a cache, and a memory with a directory of all local memory blocks. We
assume the directory format consists of the directory state and a full-bit sharing vector. The table
shows requests made by various processors in the following format: Rx or Wx, where R stands for
a read request, W stands for a write request, and x stands for the ID of the processor making the
request. Messages generated as a result of the processor request are also shown, in the following
format: Msgtype(Src → Dest).
Note that in this illustration, we assume that all messages that correspond to each request occur
atomically, i.e., messages from different requests are not overlapped. Overlapping of messages from
different requests can introduce potential correctness problems and must be handled correctly. They
will be discussed in later sections. For now, we assume that requests are processed one at a time.
The example does not make any assumption on where the home of the block resides. Each time
a message is sent from a processor to another processor or the home node, it is counted as one
hop and is shown in one line. A message that cannot be sent before a prior message is received
indicates message dependence, and they must be sent in two different hops. Messages that do not
have dependence can be sent in parallel, denoted with the symbol “//” in the table.
Initially, the block is uncached, and hence the directory content indicates the uncached (U) state,
and the full bit sharing vector has 0 in all bits. When processor P1 wants to read the block, it sends
a read request to the home node (first hop). The home node checks the directory and finds that
the block is uncached, hence it has the latest value of the block. It then sends a ReplyD message
containing the data value of the block to P1. It also changes the state of the block to EM, with the
sharing vector bit for P1 set to “1” (the sharing full-bit vector becomes “100”).
Next, P1 wants to write to the block that it has just brought into its cache. It checks the state of
the block, and since the block is in the exclusive state, it changes the state to modified and writes to
the block silently, i.e., without sending any messages. The directory is unaware of this action, but it
does not need to, because the directory state of the block is already EM.
When P3 wants to read from the block, it suffers a cache miss, and this causes a Read message
to be sent to the home node. The home node checks its directory and finds that the state is EM and
the block is exclusively cached by P1. Since it does not know whether the block is clean or dirty
in P1, it assumes that the block is dirty and sends a intervention message to it. In the message, it
includes the ID of P3 so that later P1 can send the data directly to P3. Home then includes P3 as a