Synchronization Primitives

Synchronization problems occur when code that is executing on two or more threads attempts to access a common resource or structure. A common synchronization problem for I/O Kit drivers arises when a driver needs to access its instance variables, since these are shared between all of the threads that the driver is executing. To give a concrete example, let's consider an actual example from the I/O Kit, namely, the OSObject base class's implementation of reference counting.

The OSObject class is the base class for all objects in the I/O Kit, and one of its roles is to maintain a reference count for each object instance and to release an object when its reference count is decremented to 0. A simplified version of the OSObject implementation, without the synchronization provided by the actual implementation, is shown in Listing 7-1.

Listing 7-1. A Possible Implementation of Object Reference Counting

void    Object::retain ()
{
        retainCount += 1;               // An instance variable defined as an int
}

void    Object::release ()
{
        retainCount -= 1;
        if (retainCount == 0)
                this->free();
}

Although the preceding code looks correct and will run perfectly well if all calls to retain() and release() are made from a single thread, the code is not thread-safe and may fail if multiple threads were to simultaneously call retain() and release() for the same object. To understand the problem, it is necessary to examine the compiler output for the previous code. In this case, the assembler instructions that follow were generated when the implementation was compiled for the 64-bit Intel architecture under a Debug build. The code for retain() contains the following sequence of instructions:

        mov     eax, retainCount          ; Load retainCount into CPU register EAX
        add     eax, 0x1                  ; Increment value in EAX
        mov     retainCount, eax          ; Write value in EAX to retainCount

And the code for release() contains the following sequence of instructions:

        mov    eax, retainCount           ; Load retainCount into CPU register EAX
        sub    eax, 0x1                   ; Decrement value in EAX
        mov    retainCount, eax           ; Write value in EAX to retainCount
        mov    eax, retainCount           ; Load retainCount into CPU register EAX
        cmp    eax, 0x0                   ; Determine whether the value of EAX is 0
        jne    skipFree                   ; If EAX is not zero, jump over the next instruction
        call   free()                     ; Otherwise, call the free() method
skipFree:
        …

The cause of the problem in a multithreaded environment is that the C code both to increment and to decrement the instance variable retainCount compiles to three CPU instructions: the value held by the instance variable retainCount is loaded from memory into a CPU register, the value of the CPU register is either incremented or decremented, and the result is then written back to memory. Let's see what can happen if two threads were to call retain() simultaneously for the same object. For simplicity, let us assume that the code is executing on a machine with a single CPU core and that the operating system's scheduler preempts the first thread at the point where the initial mov instruction has been executed.

image

In this scenario, thread 1 will read the value of retainCount from memory into the EAX register. At this point, the operating system's scheduler preempts thread 1 and switches to thread 2 (after saving the state of thread 1's CPU registers). Thread 2 now runs and will read the same value of retainCount into the EAX register as was read by thread 1. It then increments the value and writes the incremented value back to memory. The operating system scheduler then preempts thread 2 and switches execution back to thread 1 after restoring the state of thread 1's saved CPU registers. Thread 1 now continues executing from where it left off, incrementing the original value of retainCount, and writing the result back to memory. Following this, retainCount has only increased in value by 1, even though the retain() method was called twice.

Note that this problem will only show up under specific conditions: Either the retainCount instance variable must be modified by two threads, with one preempting the other in the way illustrated, or the two threads must be running simultaneously on two CPU cores. A problem such as this, in which the result of executing code depends on the timing and the order in which the code runs, is known as a race condition. Race conditions can lead to problems that are very difficult to debug since the problem by its nature is timing-dependent and therefore may not occur every time the code is run. In fact, the code may appear to run perfectly fine during testing and it will only become apparent that the driver has problems when reports come in from users.

As well as being difficult to reproduce, race conditions can be very difficult to diagnose when they do cause problems. Take the example of the race condition outlined previously in which an object's retain count is incremented by 1, even though two calls to retain() were made. This wouldn't cause any immediate problems and the driver would continue to function as if nothing were wrong until much later, when the object is released. Since the object was retained twice, the calling code should be expected to release the object twice. However, since the value of the retain count is one less than the value it should be, the object will be destroyed while the driver still holds one reference to it. This means that, at some later time, the driver will try to access the object that it thinks it holds a reference to. But that object will have been destroyed and the driver will crash with an access to invalid memory. Note that the code that ends up crashing may be in a completely different function to the function that contains the race condition. As a result, tracing the cause of the bug back to retain() and release() will involve considerable sleuth work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.200.76