i
i
i
i
i
i
i
i
260 CHAPTER 7. BASIC CACHE COHERENCE ISSUES
Write2 I M I BusRdX/Flush P1.C 90
Write3 I I M BusRdX/Flush P2.C 90
TOTAL 544
Stream 3 (MESI):
Processor P1 P2 P3 Bus Action Data From Cycles
Read1 E - - BusRd(!C) Mem 90
Read2 S S - BusRd(C)/FlushOpt P1.C 90
Read3 S S S BusRd(C)/FlushOpt P1/2.C 90
Read3 S S S - - 1
Write1 M I I BusUpgr - 60
Write1 M I I - - 1
Write1 M I I - - 1
Write1 M I I - - 1
Write2 I M I BusRdX/Flush P1.C 90
Write3 I I M BusRdX/Flush P2.C 90
TOTAL 514
Stream 3 (MOESI):
Processor P1 P2 P3 Bus Action Data From Cycles
Read1 E - - BusRd(!C) Mem 90
Read2 S S - BusRd(C)/FlushOpt P1.C 90
Read3 S S S BusRd(C) Mem 90
Read3 S S S - - 1
Write1 M I I BusUpgr - 60
Write1 M I I - - 1
Write1 M I I - - 1
Write1 M I I - - 1
Write2 I M I BusRdX/Flush P1.C 90
Write3 I I M BusRdX/Flush P2.C 90
TOTAL 514
2. Dragon Protocol
For each of the memory reference streams given in the following, compare the cost of ex-
ecuting it on a bus-based machine that supports Dragon protocol. All the references (read-
/write) in the streams are to the same location, and the digit refers to the processor issuing
the reference. Assume that all caches are initially empty, and use the following cost model: a
read/write cache hit takes 1 cycle; a read/write cache miss which does not involve transferring
a whole block on the bus takes 60 cycles to complete; and a read/write miss which involves
BusUpd. Assume all caches are write back and write allocated.
i
i
i
i
i
i
i
i
7.6. EXERCISES 261
The first stream is: R1 (read by processor 1), W1 (write by processor 1), R1, W1, R2, W2,
R2, W2, R3, W3, R3, W3.
The second stream is: R1, R2, R3, W1, W2, W3, R1, R2, R3, W3, W1.
The third stream is: R1, R2, R3, R3, W1, W1, W1, W1, W2, W3.
Answer:
Stream 1 (Dragon):
Processor P1 P2 P3 Bus Action Data From Cycles
Read1 E - - BusRd(!C) Mem 90
Write1 M - - - - 1
Read1 M - - - - 1
Write1 M - - - - 1
Read2 Sm Sc - BusRd(C)/Flush P1.C 90
Write2 Sc Sm - BusUpd(C) - 60
Read2 Sc Sm - - - 1
Write2 Sc Sm - BusUpd(C) - 60
Read3 Sc Sm Sc BusRd(C)/Flush P2.C 90
Write3 Sc Sc Sm BusUpd(C) - 60
Read3 Sc Sc Sm - - 1
Write3 Sc Sc Sm BusUpd(C) - 60
TOTAL 515
Stream 2 (Dragon):
Processor P1 P2 P3 Bus Action Data From Cycles
Read1 E - - BusRd(!C) Mem 90
Read2 Sc Sc - BusRd(C) Mem 90
Read3 Sc Sc Sc BusRd(C) Mem 90
Write1 Sm Sc Sc BusUpd(C) - 60
Write2 Sc Sm Sc BusUpd(C) - 60
Write3 Sc Sc Sm BusUpd(C) - 60
Read1 Sc Sc Sm - - 1
Read2 Sc Sc Sm - - 1
Read3 Sc Sc Sm - - 1
Write3 Sc Sc Sm BusUpd(C) - 60
Write1 Sm Sc Sc BusUpd(C) - 60
TOTAL 573
Stream 3 (Dragon):
Processor P1 P2 P3 Bus Action Data From Cycles
Read1 E - - BusRd(!C) Mem 90
Read2 Sc Sc - BusRd(C) Mem 90
i
i
i
i
i
i
i
i
262 CHAPTER 7. BASIC CACHE COHERENCE ISSUES
Read3 Sc Sc Sc BusRd(C) Mem 90
Read3 Sc Sc Sc - - 1
Write1 Sm Sc Sc BusUpd(C) - 60
Write1 Sm Sc Sc BusUpd(C) - 60
Write1 Sm Sc Sc BusUpd(C) - 60
Write1 Sm Sc Sc BusUpd(C) - 60
Write2 Sc Sm Sc BusUpd(C) - 60
Write3 Sc Sc Sm BusUpd(C) - 60
TOTAL 631
Homework Problems
1. Suppose a 4-processor system uses a bus-based broadcast/snoopy coherence protocol. As-
sume that the cost of processing a protocol transaction depends on where data is supplied
and whether it involves a bus transaction. A cache hit not involving a bus transaction costs 1
cycle, a cache hit involving a bus transaction costs 10 cycles. A cache miss with data supplied
by another cache costs 50 cycles. A cache miss with data supplied by the main memory costs
80 cycles. The caches uses MSI coherence protocol. Show what happens to cache states after
each memory reference. The memory reference stream is: R1 (read by processor 1), R2, R3,
R2, W3 (write by processor 3), E3 (eviction by processor 3), W4, R2, R3, W1.
2. Repeat homework problem (1) for MESI protocol.
3. Repeat homework problem (1) for MOESI protocol.
4. Repeat homework problem (1) for Dragon protocol. Additionally, assume that any memory
reference that involves a BusUpd transaction requires 50 clock cycles to complete.
5. Some systems have implemented MOSI protocol to allow dirty sharing, which consists of
four states: modified, owned, shared, and invalid. The meaning of the states are:
Modified: data is dirty and can only be kept in one cache
Owned: data is dirty and may be kept in multiple caches, but only one of them has it in
the owned state
Shared: data may be clean/dirty, and may be kept in multiple caches
Invalid: data is uncached or invalid
Assume a bus-based multiprocessor with write back caches. The following bus transactions
are available: BusRd, BusRdX, BusUpgr, and Flush. FlushOpt is not implemented. Show
the state transition diagram for processor-initiated requests, and state transition diagram for
snooped bus requests.
6. Compare the latencies of MSI and MESI for the following cases: (a) read latency for data
that was earlier read by one other processor, (b) write latency for data that is read then written
by only one processor, and (3) read latency for data that is written by one processor and then
read by another processor.
i
i
i
i
i
i
i
i
7.6. EXERCISES 263
7. Compare the latencies of MESI and MOESI for the following cases: (a) read latency for data
that was earlier read by one other processor, (b) write latency for data that is read then written
by only one processor, (c) read latency for data that is written by one processor then read by
another processor, (d) read latency for clean data kept in other caches, and (e) bandwidth to
the main memory for a read to data recently written by another processor.
8. Non-Silent Replacement Suppose that the MESI protocol is modified so that the replacement
of a clean block is no longer silent, i.e., it is broadcasted on the bus. This way a cache can
detect if it becomes the only cache that holds a copy of the block, and can transition the state
of the block from shared (S) to exclusive (E). Explain the advantages and drawbacks of this
modification. Draw the cache coherence protocol diagram for the modified MESI protocol.
9. MOESI Problem In the MOESI protocol, if a block in the owned (O) is evicted from the
cache, the block is written back to the main memory, even though there may be other caches
which hold the block in a shared (S) state. This write back incurs an unnecessary off-chip
bandwidth consumption. Propose a solution to avoid having to write back the block, or at
least reduce the frequency of such write backs. Draw the cache coherence protocol diagram
for your solution.
i
i
i
i
i
i
i
i
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.113.131