Chapter 8 - Hardware Support for Synchronization (7/8)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.4. EXERCISES 295

Bus Trans. Action P1 P2 P3 P4 Comment

Initial State S S S S Initially, P1 holds the lock

1 st1 M I I I P1 releases lock

2 ld2 S S I I P2 read misses after invalidation

3 t&s2 I M I I P2 executes t&s and issues BusRdX

st2 I M I I P2 releases lock

4 ld3 I S S I P3 read misses after invalidation

5 t&s3 I I M I P3 executes t&s and issues BusRdX

st3 I I M I P3 releases lock

6 ld4 I I S S P4 read misses after invalidation

7 t&s4 I I I M P4 executes t&s and issues BusRdX

st4 I I I M P4 releases lock

Best case for LL/SC lock: 7 bus transactions

Bus Trans. Action P1 P2 P3 P4 Comment

Initial State S S S S Initially, P1 holds the lock

1 st1 M I I I P1 releases lock

2 ll2 S S I I P2 read misses after invalidation

3 sc2 I M I I P2 executes a successful sc

st2 I M I I P2 releases lock

4 ll3 I S S I P3 read misses after invalidation

5 sc3 I I M I P3 executes a successful sc

st3 I I M I P3 releases lock

6 ll4 I I S S P4 read misses after invalidation

7 sc4 I I I M P4 executes a successful sc

st4 I I I M P4 releases lock

Worst case for test-and-t&s lock: 15 bus transactions

Bus Trans. Action P1 P2 P3 P4 Comment

Initial State S S S S Initially, P1 holds the lock

1 st1 M I I I P1 releases lock

2 ld2 S S I I P2 read misses after invalidation

3 ld3 S S S I P3 read misses after invalidation

4 ld4 S S S S P4 read misses after invalidation

5 t&s2 I M I I P2 executes t&s and issues BusRdX

6 t&s3 I I M I P3 executes t&s and issues BusRdX

7 t&s4 I I I M P4 executes t&s and issues BusRdX

8 st2 I M I I P2 releases lock

9 ld3 I S S I P3 read misses after invalidation

10 ld4 I S S S P4 read misses after invalidation

296 CHAPTER 8. HARDWARE SUPPORT FOR SYNCHRONIZATION

11 t&s3 I I M I P3 executes t&s and issues BusRdX

12 t&s4 I I I M P4 executes t&s and issues BusRdX

13 st3 I I M I P3 releases lock

14 ld4 I I S S P4 read misses after invalidation

15 t&s4 I I I M P4 executes t&s and issues BusRdX

st4 I I I M P4 releases lock

Worst case for LL/SC lock: 10 bus transactions

Bus Trans. Action P1 P2 P3 P4 Comment

Initial State S S S S Initially, P1 holds the lock

1 st1 M I I I P1 releases lock

2 ll2 S S I I P2 read misses after invalidation

3 ll3 S S S I P3 read misses after invalidation

4 ll4 S S S S P4 read misses after invalidation

5 sc2 I M I I P2 executes a successful sc

sc3 I M I I P3’s sc fails, no bus transaction generated

sc4 I M I I P4’s sc fails, no bus transaction generated

st2 I M I I P2 releases lock

6 ll3 I S S I P3 read misses after invalidation

7 ll4 I S S S P4 read misses after invalidation

8 sc3 I I M I P3 executes a successful sc

sc4 I I M I P4’s sc fails, no bus transaction generated

st3 I I M I P3 releases lock

9 ll4 I I S S P4 read misses after invalidation

10 sc4 I I I M P4 executes a successful sc

st4 I I I M P4 releases lock

2. Using LL/SC. Use LL/SC instructions to construct other atomic operations listed, and show

the resulting assembly code segments.

(a) fetch&no-op L performs an atomic sequence of reading the value in the location L

and storing it back into the same location L.

(b) fetch&inc L performs an atomic sequence of reading the value in the location L,

increment the value by one, and write the new value to L.

value held in register R and location L.

Answer:

(a) fetch-noop: LL R, L // R = mem[L]

SC L, R // mem[L] = R

8.4. EXERCISES 297

(b) fetch-inc: LL R, L // R = mem[L]

add R, R, #1 // R = R + 1

SC L, R // mem[L] = R

bscfail R, fetch-inc // loop back if SC fails

sc L, R // mem[L] = R

bscfail R, atomic-exch // loop back if SC fails

mov R, R2 // R = R2

Note that the mov instruction is purposedly placed after the sc, which is important to

make sure that the sc is more likely to succeed. It is safe to do that because the value

to be assigned to R is in the register R2, and hence can no longer be affected by an

intervention or invalidation of another processor.

3. Implementing locks. Implement lock() and unlock() directly using the atomic ex-

change instruction. The instruction atomic exch R, L performs an atomic sequence of

swapping/exchanging the value held in register R and location L. Use the following conven-

tion: a value of “1” indicates that a lock is currently held by a process, and a value of “0”

indicates that a lock is free. Your implementation should not repeatedly generate bus trafﬁc

when a lock is continuously held by another process.

Answer:

lock: mov R, #1 // R = 1

loop: ld R2, L // R2 = mem[L]

bnz R2, loop // Lock not free, loop back

atomic_exch R, L // exchange R with mem[L]

bnz R, loop // lock attempt fails, loop back

ret // lock successfully acquired, return

unlock: st L, #0 // release the lock

ret

4. Barrier implementation. A proposed solution for implementing the barrier is the following:

BARRIER (Var B: BarVariable, N: integer)

{

if (fetch&add(B,1) == N-1)

B = 0;

else

while (B != 0) {}; // spin until B is zero

}

Describe a correctness problem with the barrier implementation. Then, rewrite the code for

BARRIER() in a way that avoids the correctness problem.

Answer:

The correctness problem occurs when all but last thread have arrived at the barrier spinning on

the while loop. Then the last thread arrives and sets B to 0. The last thread may then continue

to the next barrier with the same name, and immediately increments B. Other threads that are

298 CHAPTER 8. HARDWARE SUPPORT FOR SYNCHRONIZATION

about to come out of the barrier have their cached copies of B invalidated, and reload them to

ﬁnd out that the value of B is no longer 0, and stay in the barrier.

The implementation can be corrected by alternating between spinning for 0 and for N-1, i.e.

BARRIER (Var B: BarVariable, N: integer)

{

static turn = 0;

if (turn == 0) {

if (fetch&add(B,1) == N-1)

B = 0;

else

while (B != 0) {}; // spin until B is zero

turn = 1;

}

else {

if (fetch&add(B,-1) == 1)

B = N;

else

while (B != N) {}; // spin until B is zero

turn = 0;

}

Homework Problems

1. Lock performance. Consider a three-processor bus-based multiprocessor using MESI pro-

tocol. Each processor executes a test-and-test&set or an LL/SC lock to gain access to a null

critical section. Consider the following sequence of events:

• Initially: P1 holds the lock

• P2 and P3 read the lock

• P1 releases the lock

• P2 and P3 read the lock

• P2 acquires the lock successfully

• P3 attempts to acquire the lock but is unsuccessful

• P2 releases the lock

• P3 reads the lock

• P3 acquires the lock successfully

• P1 and P2 read the lock

• P3 releases the lock

Considering only the bus transactions related to lock-unlock operations:

8.4. EXERCISES 299

(a) Show the states of each cache for the sequence assuming the test-and-test&set lock

implementation. Use the following template. Bus transactions corresponding to the ﬁrst

three steps are shown.

Bus Trans. Action P1 P2 P3 Comment

- Initially M I I Initially, P1 holds the lock

1 (BusRd) ld2 S S I P2 read misses on the lock

2 (BusRd) ld3 S S S P3 read misses on the lock

3 (BusUpgr) st1 M I I P1 releases the lock

4 and so on ...

(b) Show the states of each cache for the sequence assuming the LL/SC lock implementa-

tion. Use the following template. Bus transactions corresponding to the ﬁrst three steps

are shown.

Bus Trans. Action P1 P2 P3 Comment

- Initially M I I Initially, P1 holds the lock

1 (BusRd) ll2 S S I P2 read misses on the lock

2 (BusRd) ll3 S S S P3 read misses on the lock

3 (BusUpgr) st1 M I I P1 releases the lock

4 and so on ...

2. Lock performance. Consider a four-processor bus-based multiprocessor TTSL lock imple-

mentations shown in the book (discussed in lectures). Suppose the cache coherence protocol

is MOESI, and we have the following events:

• Initially, P1 holds the lock (and lock variable is cached in modiﬁed state)

• P2, P3, and P4 reads the lock variable in that order

• P1 releases the lock

• P2, P3, and P4 reads the lock variable in that order

• P4 acquires the lock successfully

• P2 and P3 attempt but fail to acquire the lock in that order

• P3 reads the lock variable

• P4 releases the lock

• P2 reads the lock variable

• P2 acquires the lock successfully

• P2 releases the lock

Show for each event what bus transaction is generated (ignore Flush and FlushOpt for brevity),

what instruction it corresponds to, and the resulting states in the cache.

3. Lock performance. Repeat homework problem (2) with LL/SC lock implementation.

4. Lock performance. Repeat homework problem (2) with Dragon protocol.

5. Lock performance. Repeat homework problem (2) with LL/SC lock implementation on

MESI protocol.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8 - Hardware Support for Synchronization (7/8)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8 - Hardware Support for Synchronization (7/8)