i
i
i
i
i
i
i
i
8.4. EXERCISES 295
Bus Trans. Action P1 P2 P3 P4 Comment
Initial State S S S S Initially, P1 holds the lock
1 st1 M I I I P1 releases lock
2 ld2 S S I I P2 read misses after invalidation
3 t&s2 I M I I P2 executes t&s and issues BusRdX
st2 I M I I P2 releases lock
4 ld3 I S S I P3 read misses after invalidation
5 t&s3 I I M I P3 executes t&s and issues BusRdX
st3 I I M I P3 releases lock
6 ld4 I I S S P4 read misses after invalidation
7 t&s4 I I I M P4 executes t&s and issues BusRdX
st4 I I I M P4 releases lock
Best case for LL/SC lock: 7 bus transactions
Bus Trans. Action P1 P2 P3 P4 Comment
Initial State S S S S Initially, P1 holds the lock
1 st1 M I I I P1 releases lock
2 ll2 S S I I P2 read misses after invalidation
3 sc2 I M I I P2 executes a successful sc
st2 I M I I P2 releases lock
4 ll3 I S S I P3 read misses after invalidation
5 sc3 I I M I P3 executes a successful sc
st3 I I M I P3 releases lock
6 ll4 I I S S P4 read misses after invalidation
7 sc4 I I I M P4 executes a successful sc
st4 I I I M P4 releases lock
Worst case for test-and-t&s lock: 15 bus transactions
Bus Trans. Action P1 P2 P3 P4 Comment
Initial State S S S S Initially, P1 holds the lock
1 st1 M I I I P1 releases lock
2 ld2 S S I I P2 read misses after invalidation
3 ld3 S S S I P3 read misses after invalidation
4 ld4 S S S S P4 read misses after invalidation
5 t&s2 I M I I P2 executes t&s and issues BusRdX
6 t&s3 I I M I P3 executes t&s and issues BusRdX
7 t&s4 I I I M P4 executes t&s and issues BusRdX
8 st2 I M I I P2 releases lock
9 ld3 I S S I P3 read misses after invalidation
10 ld4 I S S S P4 read misses after invalidation
i
i
i
i
i
i
i
i
296 CHAPTER 8. HARDWARE SUPPORT FOR SYNCHRONIZATION
11 t&s3 I I M I P3 executes t&s and issues BusRdX
12 t&s4 I I I M P4 executes t&s and issues BusRdX
13 st3 I I M I P3 releases lock
14 ld4 I I S S P4 read misses after invalidation
15 t&s4 I I I M P4 executes t&s and issues BusRdX
st4 I I I M P4 releases lock
Worst case for LL/SC lock: 10 bus transactions
Bus Trans. Action P1 P2 P3 P4 Comment
Initial State S S S S Initially, P1 holds the lock
1 st1 M I I I P1 releases lock
2 ll2 S S I I P2 read misses after invalidation
3 ll3 S S S I P3 read misses after invalidation
4 ll4 S S S S P4 read misses after invalidation
5 sc2 I M I I P2 executes a successful sc
sc3 I M I I P3’s sc fails, no bus transaction generated
sc4 I M I I P4’s sc fails, no bus transaction generated
st2 I M I I P2 releases lock
6 ll3 I S S I P3 read misses after invalidation
7 ll4 I S S S P4 read misses after invalidation
8 sc3 I I M I P3 executes a successful sc
sc4 I I M I P4’s sc fails, no bus transaction generated
st3 I I M I P3 releases lock
9 ll4 I I S S P4 read misses after invalidation
10 sc4 I I I M P4 executes a successful sc
st4 I I I M P4 releases lock
2. Using LL/SC. Use LL/SC instructions to construct other atomic operations listed, and show
the resulting assembly code segments.
(a) fetch&no-op L performs an atomic sequence of reading the value in the location L
and storing it back into the same location L.
(b) fetch&inc L performs an atomic sequence of reading the value in the location L,
increment the value by one, and write the new value to L.
(c) atomic exch R, L performs an atomic sequence of swapping or exchanging the
value held in register R and location L.
Answer:
(a) fetch-noop: LL R, L // R = mem[L]
SC L, R // mem[L] = R
i
i
i
i
i
i
i
i
8.4. EXERCISES 297
(b) fetch-inc: LL R, L // R = mem[L]
add R, R, #1 // R = R + 1
SC L, R // mem[L] = R
bscfail R, fetch-inc // loop back if SC fails
(c) atomic-exch: ll R2, L // R2 = mem[L]
sc L, R // mem[L] = R
bscfail R, atomic-exch // loop back if SC fails
mov R, R2 // R = R2
Note that the mov instruction is purposedly placed after the sc, which is important to
make sure that the sc is more likely to succeed. It is safe to do that because the value
to be assigned to R is in the register R2, and hence can no longer be affected by an
intervention or invalidation of another processor.
3. Implementing locks. Implement lock() and unlock() directly using the atomic ex-
change instruction. The instruction atomic exch R, L performs an atomic sequence of
swapping/exchanging the value held in register R and location L. Use the following conven-
tion: a value of “1” indicates that a lock is currently held by a process, and a value of “0”
indicates that a lock is free. Your implementation should not repeatedly generate bus traffic
when a lock is continuously held by another process.
Answer:
lock: mov R, #1 // R = 1
loop: ld R2, L // R2 = mem[L]
bnz R2, loop // Lock not free, loop back
atomic_exch R, L // exchange R with mem[L]
bnz R, loop // lock attempt fails, loop back
ret // lock successfully acquired, return
unlock: st L, #0 // release the lock
ret
4. Barrier implementation. A proposed solution for implementing the barrier is the following:
BARRIER (Var B: BarVariable, N: integer)
{
if (fetch&add(B,1) == N-1)
B = 0;
else
while (B != 0) {}; // spin until B is zero
}
Describe a correctness problem with the barrier implementation. Then, rewrite the code for
BARRIER() in a way that avoids the correctness problem.
Answer:
The correctness problem occurs when all but last thread have arrived at the barrier spinning on
the while loop. Then the last thread arrives and sets B to 0. The last thread may then continue
to the next barrier with the same name, and immediately increments B. Other threads that are
i
i
i
i
i
i
i
i
298 CHAPTER 8. HARDWARE SUPPORT FOR SYNCHRONIZATION
about to come out of the barrier have their cached copies of B invalidated, and reload them to
find out that the value of B is no longer 0, and stay in the barrier.
The implementation can be corrected by alternating between spinning for 0 and for N-1, i.e.
BARRIER (Var B: BarVariable, N: integer)
{
static turn = 0;
if (turn == 0) {
if (fetch&add(B,1) == N-1)
B = 0;
else
while (B != 0) {}; // spin until B is zero
turn = 1;
}
else {
if (fetch&add(B,-1) == 1)
B = N;
else
while (B != N) {}; // spin until B is zero
turn = 0;
}
}
Homework Problems
1. Lock performance. Consider a three-processor bus-based multiprocessor using MESI pro-
tocol. Each processor executes a test-and-test&set or an LL/SC lock to gain access to a null
critical section. Consider the following sequence of events:
Initially: P1 holds the lock
P2 and P3 read the lock
P1 releases the lock
P2 and P3 read the lock
P2 acquires the lock successfully
P3 attempts to acquire the lock but is unsuccessful
P2 releases the lock
P3 reads the lock
P3 acquires the lock successfully
P1 and P2 read the lock
P3 releases the lock
Considering only the bus transactions related to lock-unlock operations:
i
i
i
i
i
i
i
i
8.4. EXERCISES 299
(a) Show the states of each cache for the sequence assuming the test-and-test&set lock
implementation. Use the following template. Bus transactions corresponding to the first
three steps are shown.
Bus Trans. Action P1 P2 P3 Comment
- Initially M I I Initially, P1 holds the lock
1 (BusRd) ld2 S S I P2 read misses on the lock
2 (BusRd) ld3 S S S P3 read misses on the lock
3 (BusUpgr) st1 M I I P1 releases the lock
4 and so on ...
(b) Show the states of each cache for the sequence assuming the LL/SC lock implementa-
tion. Use the following template. Bus transactions corresponding to the first three steps
are shown.
Bus Trans. Action P1 P2 P3 Comment
- Initially M I I Initially, P1 holds the lock
1 (BusRd) ll2 S S I P2 read misses on the lock
2 (BusRd) ll3 S S S P3 read misses on the lock
3 (BusUpgr) st1 M I I P1 releases the lock
4 and so on ...
2. Lock performance. Consider a four-processor bus-based multiprocessor TTSL lock imple-
mentations shown in the book (discussed in lectures). Suppose the cache coherence protocol
is MOESI, and we have the following events:
Initially, P1 holds the lock (and lock variable is cached in modified state)
P2, P3, and P4 reads the lock variable in that order
P1 releases the lock
P2, P3, and P4 reads the lock variable in that order
P4 acquires the lock successfully
P2 and P3 attempt but fail to acquire the lock in that order
P3 reads the lock variable
P4 releases the lock
P2 reads the lock variable
P2 acquires the lock successfully
P2 releases the lock
Show for each event what bus transaction is generated (ignore Flush and FlushOpt for brevity),
what instruction it corresponds to, and the resulting states in the cache.
3. Lock performance. Repeat homework problem (2) with LL/SC lock implementation.
4. Lock performance. Repeat homework problem (2) with Dragon protocol.
5. Lock performance. Repeat homework problem (2) with LL/SC lock implementation on
MESI protocol.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.145.173