Locking a Cache Line

When the processor begins a locked RMW operation, there are several possible cases:

  • the semaphore isn't in the cache.

  • the semaphore is in the cache in the E state.

  • the semaphore is in the cache in the S state.

  • the semaphore is in the cache in the M state.

Intel® documentation states that the Pentium® 4 processor implements cache line locking in areas of memory designated as WB memory. It should be noted that Intel® provides almost no information on how this works, but the author is as certain as can be that the following description is accurate.

The Advantage of Cache Line Locking

FSB locking is inefficient—for the duration of a processor's RMW operation, no other FSB agent can initiate a new transaction. If there are a lot of RMW operations being performed by the processors and/or Priority Agents, this can severely degrade system performance. If a semaphore is cached by a Pentium® 4 processor, the RMW operation can be performed without locking the FSB.

A New Directory Bit—Cache Line Locked

Intel® states that a cache line can be locked. This implies that there is a lock bit in each L1 Data Cache and L2 Cache directory entry (and the L3 Cache entries, if there is an L3 Cache) that is used for this purpose.

The Memory Read and Invalidate Transaction (RWITM, or RFO)

In the event of a race condition, you don't want multiple processors that have a cached copy of the line (in the S state) to be testing the same cached semaphore simultaneously. Therefore, before performing a locked RMW operation in the cache, a processor must first gain exclusive ownership of the line. This implies that you have to kill everyone else's copy of the line. Intel® has included a special transaction type, Memory Read and Invalidate, for situations like this. The PowerPC bus protocol has two transaction types that perform a similar role: Kill and RWITM (Read With Intent To Modify). The Memory Read and Invalidate transaction is frequently referred to as an RWITM or as an RFO (Read for Ownership).

Line Containing a Semaphore Is in the E or M State

A processor that has a copy of a line in the E or M state has the only copy of the line and therefore doesn't have to worry about another processor testing a copy of the semaphore in its cache at the same time that it is doing so. However, it is possible that another processor attempting to read the semaphore will experience a cache miss and initiate a Memory Read and Invalidate transaction to obtain an exclusive copy of the line on which to perform its RMW. The processor with the E or M copy must prevent the other processor from obtaining the line that contains the semaphore until it has finished its own RMW operation on the semaphore within the line. This is accomplished by marking the line locked in the cache when the read portion of the internal RMW is initiated. When the snoop of the other processor's Memory Read and Invalidate transaction on the FSB results in a hit on a locked line, the processor delays presentation of the snoop result to the other processor (by indicating a Snoop Stall—it asserts both HIT# and HITM#) until its RMW has been completed. It then indicates the state of the line after the RMW. This could be either I (Invalid) or M (Modified):

  1. It would be I if the line is in the E state and, when the semaphore is read from the line, it is discovered that the semaphore had been set to a non-zero value by another task at an earlier time. In this case, the processor doesn't update the line. Rather, it invalidates the line, stops indicating a Snoop Stall, and indicates a snoop miss (neither HIT# nor HITM# asserted) in the Snoop Phase of the other processor's Memory Read and Invalidate transaction.

  2. It would be M (snoop hit on an M copy) in two cases:

    • The processor didn't update the semaphore because it was already set to a non-zero value. However, some other item in the line had previously been updated (in other words, the line was in the M state before the RMW was initiated).

    • When read from the M line, the semaphore was zero, so the processor wrote to the semaphore to update it and the line stayed in the M state.

Line Containing a Semaphore Isn't in the L1 or L2 Cache

If the semaphore is not in the L1 Data or the L2 Cache (or the L3 Cache, if there is one) when the attempt is made to read it, it results in a cache miss. The read request is submitted to the FSB Interface Unit and the processor initiates a Memory Read and Invalidate transaction (which has the side effect of killing copies of the line in other processors' caches) to obtain the 64 byte line from memory or from another processor (if another processor has a copy of the line in the M state). The Memory Read and Invalidate transaction is used by the processor in cases when it is reading data from memory with the intent to modify it when the data has been obtained. This implies that any other processor that has a copy of the line in the E or S state should kill its copy of the line. If another processor has a copy of the line in the M state, it should source the line directly to the requesting processor and then kill its copy. When the line has been read from memory or from another processor's cache (in the case of a hit on an M line), the processor marks it locked in the data cache while it proceeds to check the semaphore value. It then performs the RMW operation. The line is unlocked when the RMW operation has been completed.

During this processor's RMW operation, another processor may initiate a read or an RMW that misses its cache, resulting in a Memory Read or a Memory Read and Invalidate transaction for the same line. A snoop is performed in this processor's cache and hits on the locked line. The snoop result to the other processor is delayed (by stretching the other processor's Snoop Phase) until the processor's RMW has been completed. The final snoop result is then delivered to the other processor:

  • Either the semaphore wasn't updated (and the line therefore wasn't modified), or

  • The semaphore was updated (and the line was marked modified). If the line was modified, the line is provided directly to the other processor from this processor's cache (and is invalidated if the other processor's transaction type is a Memory Read and Invalidate).

Otherwise, the other processor obtains the line from memory. If the other processor's transaction is a Memory Read (rather than a Memory Read and Invalidate), the line is marked S in this processor's cache. If the other processor's transaction was a Memory Read and Invalidate, the line is invalidated in this processor's cache.

In the case where two processors are in a race condition wherein they are both initiating RMW operations that miss their caches, they both initiate Memory Read and Invalidate transactions on the FSB. Since they can't both initiate a transaction simultaneously, however, one wins FSB ownership first and initiates its Memory Read and Invalidate. The second processor then initiates its Memory Read and Invalidate transaction. The first processor obtains the line from memory and marks it locked while it performs the RMW. When the second processor's Memory Read and Invalidate transaction reaches its Snoop Phase, the first processor snoops its cache and hits on the locked line. It delays the delivery of the snoop result until it completes its RMW operation and then delivers the final snoop result to the second processor. If the line is not modified, it indicates a miss to the second processor and invalidates its copy. On the other hand, if the RMW operation modified the line, it indicates a hit on a modified line, sources the line directly to the second processor, and invalidates its copy of the line. The second processor now has a copy of the line and can perform its RMW operation.

Line Containing a Semaphore Is in the L2 Cache in the E State

If the read portion of the RMW hits on an E copy of the line in the L2 Cache, no other processor has a copy of the line in its cache. The line is marked locked in the cache (in case another processor tries to read the line during the RMW operation) and the RMW operation is then performed on the semaphore within line. The lock is then removed.

Line Containing a Semaphore Is in the Cache in the S State

If the read portion of the RMW hits on an S copy of the line in the L2 Cache, at least one other processor has a copy of the line. Before this processor can perform the RMW, it must first gain exclusive ownership of the line by killing copies in the caches of other processors. It does this by issuing a Memory Read and Invalidate transaction for 0 bytes of data (this is a kill). Any other processor that has a copy of the line must kill its copy. The line is then marked locked in the cache while the RMW operation is performed, after which it is unlocked.

Line Containing a Semaphore Is in the Cache in the M State

If the read portion of the RMW hits on an M copy of the line in the L2 Cache, no other processor has a copy of the line in its cache. The line is marked locked in the cache (in case another processor tries to read the line during the RMW operation) and the RMW operation is then performed on the semaphore within the line. The lock is then removed.

Semaphore Straddles Two Cache Lines

If a semaphore straddles two cache lines and a locked RMW is performed on the semaphore, the Pentium® 4 processor will not lock both cache lines. A cache line is only locked if the semaphore resides fully-within a single cache line.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.219.65