Chapter 6 - Introduction to Shared Memory Multiprocessors (3/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.3. SYNCHRONIZATION PROBLEM 201

Compare this against Code 6.3, which only tests one variable lockvar in order to acquire a lock.

Of course, Code 6.3 is incorrect unless it is used in conjunction with an atomic instruction supported

by the hardware. This points out that it is important to have hardware support for synchronization

in order to reduce synchronization overheads and enable scalable synchronization.

Finally, it is important to note that the correctness of a synchronization primitive is related to

the memory consistency model provided by the processor. By nature, synchronization operation is

used to impose ordering between memory accesses performed by different threads. However, the

ordering of the memory accesses performed by a synchronization operation and those performed

outside the synchronization cannot be guaranteed unless the hardware provides either a strict mem-

ory consistency model, or an explicit mechanism that can order memory accesses. This will be

discussed further in Chapter 8.

202 CHAPTER 6. INTRODUCTION TO SHARED MEMORY MULTIPROCESSORS

6.4 Exercises

Worked Problems

1. Register allocation. Suppose that you are writing a compiler algorithm that decides what

variable can be allocated in a register. A register allocation allows the variable to be accessed

in the register without involving a load or store to read from or write to it. In order not to create

cache coherence problem, which of the following types of variables can be register allocated

vs. cannot be register allocated: read-only, read/write non-conﬂicting, and r/w conﬂicting?

Explain. (see Section 3.6 for what each variable type refers to).

Answer:

Variable type Can be register allocated?

Read-only Yes

R/W Non-Conﬂicting Yes

R/W Conﬂicting No

R/W conﬂicting variables are written and read by different threads. Hence, they cannot be

hence write propagation no longer occurs.

2. Cache coherence. Assume that we have a two-processor system (P1 and P2). These pro-

cessors run different threads (T1 and T2, respectively) that access a data block containing

variable sum, in the following sequence in time:

T1: prefetch the block containing sum from &sum (i.e., the location of sum)

T2: read the value of sum from &sum

T2: add 7 to sum and write into &sum

T1: read the value of sum from &sum

T1: add 3 to sum and write into &sum

T1: read sum to print its value

Each access may hit or miss in the cache, depending on the cache conﬁguration that will be

speciﬁed. For each situation below, determine if there is a correctness issue. Also give the

ﬁnal values in all caches and in the main memory.

(a) The two processors P1 and P2 share an only single level cache (the L1 cache) that uses

a write back policy.

(b) Each processor has its own single-level write through cache. Does your answer depend

on whether the cache has a write allocate or write no-allocate policy?

replaced immediately after it is written to. There are no other levels of caches in the

system.

(d) Each processor has a cache, but the block containing sum is not allowed to be cached.

6.4. EXERCISES 203

Answer:

(a) No correctness problem because data block containing ”sum” is only cached at a single

location (the shared L1 cache).

L1 cache 10, dirty

Memory 0

(b) Correctness problem exists because a block may be kept in several L1 caches. A write

through write allocate L1 cache only guarantees that a write will be propagated to the

outer level memory hierarchy but does not guarantee write propagation to other L1

caches.

P1’s cache 3

P2’s cache 7

Memory 3

Furthermore, the problem persists even with a write no-allocate policy. This is because

the block was brought into the cache through a prefetch instruction, not a write instruc-

tion.

ten value is propagated to the outer memory hierarchy, it does not ensure that future

reads in other caches will see that new value.

P1’s cache 3

P2’s cache -

Memory 3

(d) No correctness problem since sum is uncached. There is only one copy of sum in the

system.

P1’s cache -

P2’s cache -

Memory 10

Homework Problems

1. Register allocation. Suppose that you are writing a compiler algorithm that decides what

variable can be allocated in a register. A register allocation allows the variable to be accessed

in the register without involving a load or store to read from or write to it. In order not to

create cache coherence problem, which type of variable can be register allocated vs. cannot

be register allocated?

Variable type Can be register allocated?

Read-only

R/W Non-Conﬂicting

R/W Conﬂicting

If there are variable types that you think cannot be safely allocated in registers, give explana-

tion of why it is so (see Section 3.6 for what each variable type refers to).

2. Cache coherence. Assume that we have a two-processor system (P1 and P2). These pro-

cessors run different threads (T1 and T2, respectively) that access a data block containing

variable sum, in the following sequence in time:

204 CHAPTER 6. INTRODUCTION TO SHARED MEMORY MULTIPROCESSORS

T1: prefetch the block containing sum from &sum (i.e., the location of sum)

T2: read the value of sum from &sum

T2: add 7 to sum and write into &sum

T1: read the value of sum from &sum

T1: add 3 to sum and write into &sum

T1: read sum to print its value

Each access may hit or miss in the cache, depending on the cache conﬁguration that will be

speciﬁed. For each situation below, determine if there is a correctness issue. Also give the

ﬁnal values in all caches and in the main memory.

(a) Each processor has its own write through L1 cache, and both processors share a single

write back L2 cache.

(b) Each processor has its own write back L1 cache, however, each block can be cached in

at most one cache. More speciﬁcally, if a block is cached at Px, and Py requests the

block on a read or a write, the block is removed from Px’s cache and placed in Py’s

cache. Assume a block is not evicted from the cache unless when it is moved to another

processor’s cache.

and T2 in only one processor (either P1 or P2, but not both) in a time-sharing manner.

(d) Each processor has its own write back L1 cache. Suppose that instead of executing with

two threads, only one thread executes. Thus, in the sequence of events shown above, all

the events occur due to T1’s execution while T2 does not exist. Furthermore, to achieve

balanced processor utilization, the operating system migrates T1 from P1 to P2 after the

ﬁrst three events, i.e. right after adding ‘7’ to sum but before reading from sum for the

second time.

3. Peterson’s algorithm. Modify Peterson’s algorithm for lock acquisition and release so that

it can be used for four threads that compete for a lock.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6 - Introduction to Shared Memory Multiprocessors (3/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 6 - Introduction to Shared Memory Multiprocessors (3/3)