Thread Synchronization

When you get beyond the simplicity of single-threaded applications and begin to explore the world of multithreaded programming, one of the first things you discover is the need to synchronize the activities of the threads in your application. Allowing one thread to modify a global variable while another is using it to control flow logic is a recipe for erratic application behavior and possibly even an access violation. The fact that multiple threads can truly execute simultaneously on multiprocessor machines does not mean that you literally want all of them running all of the time. There are times when you need one thread to wait on others to finish what they're doing before proceeding. There are times when you need to synchronize them.

The single most important element of thread synchronization is atomic access—ensuring that a thread has access to a resource in a manner that guarantees that no other thread will access the same resource simultaneously. Windows provides a number of objects and API calls to facilitate atomic access. We'll discuss each of these and delve into each one separately.

Key Thread Synchronization Terms and Concepts

Synchronization— ensuring that resources are accessed by threads in a manner that allows them to be used safely and their data to be trusted.

Deadlock— what happens when two or more threads wait indefinitely on resources owned by each other.

Wait function— one of the Win32 functions designed to put a thread to sleep until a resource becomes available or a timeout period expires.

Signaled— the state of an object when it is available for use by a thread or when a Win32 wait function should not wait on it.

Unsignaled— the state of an object when it is not available for use by a thread or when a Win32 wait function should wait on it.

Spinlock— a user mode construct that continuously polls a resource to check its availability. Spinlocks often make use of the interlocked family of functions.

Interlocked function— a member of the Win32 family of functions that provides simple, atomic updates of variables.

Kernel synchronization object— one of several different types of kernel objects that can be used to synchronize access to a resource. Examples of kernel synchronization objects include mutexes, semaphores, and events, to name just a few.

Thread-safe code— code that has been designed such that multiple threads accessing the same resources do so in a safe and predictable manner.

Atomic access— ensuring that a thread retrieves, modifies, and returns a value or resource as a single operation without having to be concerned about another thread modifying the value simultaneously.

Key Thread Synchronization APIs

Table 3.14. Key Synchronization-Related API Functions
FunctionDescription
EnterCriticalSectionDenotes a section of code that only one thread can access at a time
LeaveCriticalSectionLeaves a section of code that was designed for single-threaded access
InterlockedExchangeAssigns one value to another in an atomic fashion
InterlockedExchangeAddAdds one value to another in an atomic fashion
CreateEventCreates a kernel event object
SetEvent/ResetEventSignals/unsignals an event object
CreateSemaphoreCreates a semaphore object
ReleaseSemaphoreReleases a reference to a semaphore
CreateWaitableTimerCreates a waitable timer object
SetWaitableTimerConfigures a waitable timer
CreateMutexCreates a mutex (mutually exclusive) object
ReleaseMutexReleases an owned mutex
WaitForSingleObjectWaits for a kernel object to become signaled
WaitForMultipleObjectsWaits for multiple kernel objects to become signaled

Key Thread Synchronization Tools

Given that most synchronization objects are kernel mode objects, there's a limit to how much information a user mode tool can give us about thread synchronization. A kernel mode debugger is unfortunately the best option here. That said, the tools in Table 3.15 provide several useful pieces of synchronization-related data.

Table 3.15. Thread Synchronization Diagnostic Tools
 Handle CountContext SwitchesThread Security ContextKernel Object Count by TypeThread PriorityThread StateCPU Times
Perfmon  
Pview
Spy++  

Synchronization Using User Mode Constructs

Windows provides two types of thread synchronization: user mode synchronization and kernel mode synchronization. User mode synchronization is implemented by functions in Kernel32.DLL and, as the name suggests, does not require the thread to switch into kernel mode. User mode synchronization is consequently faster than synchronizing threads using kernel objects. On the other hand, user mode objects cannot be used to synchronize threads in multiple processes and cannot be used with the Win32 wait functions, which allow a thread to wait on a resource without consuming any CPU resources and allow the waiter to timeout. Examples of user mode synchronization objects/constructs include spinlocks and critical sections.

Spinlocks

Simply put, a spinlock is a loop that iterates until a resource becomes available. Listing 3.3 shows a simple example in C++.

Listing 3.3. A Simple Spinlock Implementation
void CSpinLock::GetLock() {
  while (TRUE == InterlockedExchange(&g_bLocked, TRUE))
    SwitchToThread();

  // use the resource

  // "unlock" the resource
  InterlockedExchange(&g_bLocked, FALSE);

}

InterlockedExchange, which we'll talk about more in a moment, assigns the value of its second parameter to its first parameter in an atomic fashion and returns the original value of the first parameter. The spinlock code above simply loops while the original value of g_bLocked is TRUE—in other words, while the global resource in question is locked. It continuously assigns TRUE to g_bLocked until InterlockedExchange no longer returns TRUE. When InterlockedExchange returns FALSE—meaning that the resource was not locked when the function was called—the loop exits. Since InterlockedExchange has already set g_bLocked to TRUE, other threads calling the GetLock method will stop at the while loop until g_bLocked is set to FALSE by the thread that just acquired the spinlock.

As you can see, a spinlock isn't a separate type of user mode object; rather, it is implemented by using user mode objects and code—in this case, a global variable and one of the interlocked functions. Though it is often wrapped in a class of some type in object-oriented languages such as C++, as far as Windows is concerned, a spinlock is really more of a programming construct than a distinct type of synchronization object.

Spinlocks and CPU Use

Even though the above example tries to be as CPU efficient as possible by calling SwitchToThread in its spin loop, it's still using some amount of CPU time while it waits on the resource. This is, unfortunately, unavoidable without the assistance of a scheduling facility (such as the one provided by Windows) that can maintain lists of waiting threads and the resources they need independent of the threads themselves, basically putting them to sleep until the resources they need become available.

This is the main reason that kernel mode objects such as mutexes and semaphores have a distinct advantage over spinlocks in terms of CPU usage. Because Windows can act on behalf of a waiting thread and allow the thread to consume no resources while the resources it needs are unavailable, waiting on a kernel object is often more CPU efficient than using a spinlock to wait on a resource even though a thread must transition from user mode to kernel mode in order to wait on a kernel object.

Because they are not coordinated by the operating system, spinlocks like the one above must make a few assumptions.

  1. Spinlocks assume that all threads run at the same thread priority level. (You might want to disable thread priority boosting for this very reason.)

  2. Spinlocks assume that the lock variable and the data to which the lock provides access are maintained in different CPU cache lines. If they are on the same cache line, you'll see contention between the processor using the resource and any CPUs waiting on it. In other words, the continual assignment of the control variable by the other CPUs will contend with the code accessing the protected resource.

  3. For this reason, spinlocks assume you are running on a multiprocessor machine. If the threads attempting to access the resource and the one that currently has it locked share a single processor, you will see significant contention as the waiting threads continually assign the control variable.

Generally speaking, you should avoid techniques and design elements that continuously poll for resource availability. Windows provides a rich set of tools for waiting on resources with minimal CPU usage. It makes sense to use what you get for free in the OS box.

Often you'll find that a hybrid approach is the best fit for a particular scenario—use spinlocks and critical sections to protect some types of resources; use kernel objects for others. Or, use a spinlock to wait a fixed number of iterations, then transition to a kernel object if it appears that the thread might have to wait for an extended period of time. This is, in fact, how critical sections themselves are implemented. A critical section starts off using a spinlock that iterates a specified number of times, then transitions to kernel mode where it waits on the resource in question.

The Interlocked Functions

Windows provides a family of API functions commonly referred to as the interlocked functions. You saw a basic example of their use above. These functions provide simple, lightweight thread synchronization that does not rely on kernel objects. Table 3.16 summarizes the interlocked functions.

You'll note that there's no interlocked function for reading a value. That's because none is necessary. If a thread attempts to read a value that is always modified using an interlocked function, it can depend on getting a good value. That is, it can assume that it will see either the value before it was changed or the value afterward—the system guarantees that it will be one or the other.

Table 3.16. The Interlocked Family of Functions
FunctionOperation
InterlockedIncrementAllows a variable to be incremented and its value to be checked in a single atomic operation
InterlockedDecrementAllows a variable to be decremented and its value to be checked in a single atomic operation
InterlockedExchangePointerAtomically exchanges the value pointed to by the first parameter for the value passed in the second parameter
InterlockedExchangeAddAtomically adds the value passed in the second parameter to the first parameter
InterlockedCompareExchangePointerAtomically compares two values and replaces the first value with a third value based on the outcome of the comparison

Critical Sections

A critical section is a user mode object you can use to synchronize threads and serialize access to shared resources. A critical section denotes a piece of code that you want executed by only one thread at a time. The process of using a critical section goes something like this.

  1. Initialize the critical section with a call to InitializeCriticalSection. This frequently occurs at program startup and is often used to set up a critical section stored in a global variable.

  2. On entrance to a routine that you want only one thread to execute at a time, call EnterCriticalSection, passing in the previously initialized critical section structure. Once you've done this, any other thread that attempts to enter this routine will be put to sleep until the critical section is exited.

  3. On exit, call LeaveCriticalSection.

  4. On program shutdown (or some other similar termination event that occurs after the critical section is no longer needed), call DeleteCriticalSection to free up the system resources used by the object.

Because you can't specify the amount of time to wait before giving up on a critical section, it's entirely possible for a thread to wait indefinitely for a resource. This happens, for example, when a critical section has been orphaned due to an exception. The waiting threads have no way of knowing that the thread that previously acquired the critical section never released it, so they wait indefinitely on a resource that will never be available.

One way to mitigate this all-or-nothing proposition is to use TryEnterCriticalSection rather than EnterCriticalSection. TryEnterCriticalSection will never allow a thread to be put into a wait state. It will either acquire the critical section or return FALSE immediately. The fact that it has this ability to return immediately is the reason that it has a return value while EnterCriticalSection does not.

When a thread calls EnterCriticalSection for a critical section object that is already owned by another thread, Windows puts the thread in a wait state. This means that it must transition from user mode to kernel mode, costing approximately 1,000 CPU cycles. This transition is usually cheaper than using a spinlock of some type to continuously poll a resource to see whether it is available.

You can integrate the concept of a spinlock with a critical section by using the InitializeCriticalSectionAndSpinCount function. This function allows you to specify a spin count for entrance into the critical section. If the critical section is not available, the function will spin for the specified number of iterations before going into a wait state. For short-duration waits, this may save you the expense of transitioning to kernel mode unnecessarily.

Note that it makes sense to specify a spin count only on a multiprocessor machine. The thread owning the critical section can't relinquish it if another thread is spinning, so InitializeCriticalSectionAndSpinCount ignores a nonzero spin count specification on uniprocessor machines and immediately enters a wait state if the critical section is not available.

You can set the spin count for a specific critical section using the Win32 API function SetCriticalSectionSpinCount. The optimal value will vary from situation to situation, but the fact that the critical section that's used to synchronize access to a process's heap has a spin count of 4000 can serve as a guide to you.

As a rule, use one critical section variable per shared resource. Don't try to conserve system resources by sharing critical sections across different resources. Critical sections don't consume that much memory in the first place, and attempting to share them unnecessarily can introduce complexities and deadlock potential into your code that don't need to be there.

Threads and Wait States

As I've mentioned, while your thread is waiting on a resource, the system acts as an agent on its behalf. The thread itself consumes no CPU resources while it waits. The system puts the thread in a wait state and automatically awakens it when the resource(s) it has been waiting for becomes available.

If you check the states of the threads across all processes on the system, you'll discover that most are in a wait state of some type most of the time. It is normal for most of the threads in a process to spend most of their time waiting on some event to occur (e.g., keyboard or mouse input). This is why it's particularly important for the operating system to provide mechanisms for waiting on resources that are as CPU efficient as possible.

Thread Deadlocks

A thread deadlock occurs when two threads each wait on resources the other has locked. If each is set up to wait indefinitely, the threads are for all intents and purposes dead and will never be scheduled again. Unlike SQL Server, Windows does not automatically detect deadlocks. Once threads are caught in a deadly embrace, the only resolution is to terminate them. Listing 3.4 presents some C++ code that illustrates a common deadlock scenario.

Listing 3.4. A Classic Deadlock Scenario
// deadlock.cpp
//

#include "stdafx.h"
#include "windows.h"

CRITICAL_SECTION g_CS1;
CRITICAL_SECTION g_CS2;

int g_HiTemps[100];
int g_LoTemps[100];

DWORD WINAPI ThreadFunc1(PVOID pvParam)
{
  EnterCriticalSection(&g_CS1);
  Sleep(5000);
  EnterCriticalSection(&g_CS2);

  for (int i=0; i<100; i++)
    g_HiTemps[i]=g_LoTemps[i];

  printf("Exiting ThreadFunc1
");

  LeaveCriticalSection(&g_CS1);
  LeaveCriticalSection(&g_CS2);

  return(0);
}

DWORD WINAPI ThreadFunc2(PVOID pvParam)
{
  EnterCriticalSection(&g_CS2);
  EnterCriticalSection(&g_CS1);

  for (int i=0; i<100; i++)
    g_HiTemps[i]=g_LoTemps[i];

  printf("Exiting ThreadFunc2
");

  LeaveCriticalSection(&g_CS2);
  LeaveCriticalSection(&g_CS1);

  return(0);
}


int main(int argc, char* argv[])
{
  DWORD dwThreadId;
  HANDLE hThreads[2];

  InitializeCriticalSection(&g_CS1);
  InitializeCriticalSection(&g_CS2);

  hThreads[0]=CreateThread(NULL,0,ThreadFunc1,NULL,0,&dwThreadId);
  hThreads[1]=CreateThread(NULL,0,ThreadFunc2,NULL,0,&dwThreadId);

  WaitForMultipleObjects(2,hThreads,TRUE,INFINITE);

  DeleteCriticalSection(&g_CS1);
  DeleteCriticalSection(&g_CS2);

  return 0;
}

The problem with this code is that the two worker threads access resources in different orders: ThreadFunc1 enters the critical sections in numerical order; ThreadFunc2 does not. I placed the call to Sleep after the first critical section is entered in order to allow the second thread function to start up and allocate the second critical section before Thread 1 can enter it. Once Sleep expires, Thread 1 then tries to enter the second critical section but is blocked by Thread 2. Thread 2, on the other hand, is waiting on critical section 1, which Thread 1 already owns. So, each thread waits indefinitely on resources the other has locked, constituting a classic deadlock scenario.

The moral of the story is this: Always request resources in a consistent order in multithreaded applications. This is a good design practice regardless of whether you are working with user mode or kernel objects.

Synchronization Using Kernel Objects

As I've mentioned, synchronizing threads via user mode objects is faster than synchronizing them using kernel mode objects, but there are trade-offs. User mode objects can't be used to synchronize multiple processes, nor can you use the Win32 wait functions to wait on them. Because you can't specify a timeout value when waiting on user mode objects such as critical sections, it's easier to block other threads and to get into deadlock situations. On the other hand, each transition to kernel mode costs you about a thousand x86 clock cycles (and this doesn't include the actual execution of the kernel mode code that implements the function you're calling), so there are definitely performance considerations when trying to decide whether to perform thread synchronization using kernel objects. As with many things, the key here is to use the right tool for the job.

Signaling

Kernel mode objects can typically be in one of two states: signaled or unsignaled. You can think of this as a flag that gets raised when the object is signaled and lowered when it isn't. Signaling an object allows you to notify other threads (in the current process or outside it) that you are ready for them to do something or that you have finished a task. For example, you might have a background thread signal an object when it has finished making a backup copy of the file being currently edited in your custom programmer's editor. Your background thread saves the file, then “raises a flag” (it signals an object) to let your foreground thread know that it's done.

Kernel mode objects such as events, mutexes, waitable timers, and semaphores exist to be used for thread synchronization and resource protection through signaling. In themselves, they do nothing. They exist solely to assist with resource protection and management—especially in multithreaded applications—by being signalable in different ways. Processes, threads, jobs, events, semaphores, mutexes, waitable timers, files, console input, and file change notifications can all be signaled or unsignaled.

Some kernel objects, such as event objects, can be reset to an unsignaled state after having been signaled, but some can't. For example, neither a process nor a thread object can be unsignaled once it has been signaled. This is because for either of these objects to be signaled, they have to terminate. You cannot resume a terminated process or thread.

Wait Functions

I've mentioned the Win32 wait functions in some of the examples we've looked at thus far, and now we'll delve into them a bit. The wait functions allow a thread to suspend itself until another object (or objects) becomes signaled. The most popular Win32 wait function is WaitForSingleObject. It takes two parameters: the handle of the object on which to wait, and the number of milliseconds to wait. You can pass in the INFINITE constant (0xFFFFFFFF, or -1) to wait indefinitely.

As the name suggests, WaitForMultipleObjects can wait for multiple objects (up to 64) to be signaled. Rather than passing in a single handle for its first parameter, you pass in an array containing the handles of the objects to wait for. WaitForMultipleObjects can wait for all objects to be signaled or just one of them. When a single object causes WaitForMultipleObjects to return, its return value indicates which object was signaled. This value will be somewhere between WAIT_OBJECT_0 and WAIT_OBJECT_0 + NumberOfHandles – 1. If you wish to call the function again with the same handle array, you'll need to first remove the signaled object or the function will return immediately without waiting.

If a wait function times out while waiting on an object, it will return WAIT_TIMEOUT. You can use this ability to implement a type of spinlock that waits for a short period of time, times out, carries out some work, then waits again on the desired resource. This keeps the thread from being completely unschedulable while it waits on a required resource and gives you a finer granularity of control over the blocking behavior of your threads.

There are several other wait functions (e.g., MsgWaitForSingleObject, MsgWaitForMultipleObjects, MsgWaitForMultipleObjectsEx, WaitForMultipleObjectsEx, WaitForSingleObjectEx, SignalObjectAndWait, and so on) that I won't go into here. You can consult the Windows Platform SDK reference for details about these functions. They are mostly variations of either WaitForSingleObject or WaitForMultipleObjects.

Events

Event objects are exactly what they sound like: objects that allow threads to signal to one another that something has occurred. They are commonly used to perform work in steps. One thread performs the first step or two of a task and then signals another thread via an event to carry out the remainder of the task.

Events come in two varieties: manual-reset events and auto-reset events. You call SetEvent to signal an event and ResetEvent to unsignal it. Auto-reset events are automatically unsignaled as soon as a single thread successfully waits on them; a manual-reset event must be reset through a call to ResetEvent. When multiple threads are waiting on an auto-reset event that gets signaled, only one of the threads gets scheduled. When multiple threads are waiting on a manual-reset event that gets signaled, all waiters become schedulable.

You call the CreateEvent API function to create a new event. Other threads can access the event by calling CreateEvent, DuplicateHandle, or OpenEvent.

Waitable Timers

Waitable timers are objects that signal themselves at a specified time or at regular intervals. You create a waitable timer with CreateWaitableTimer and configure it with SetWaitableTimer. You can pass in an absolute or relative time (pass a negative DueDate parameter to indicate a relative time in the future) or an interval at which the timer is supposed to signal. Once the interval or time passes, the timer signals and optionally queues an APC routine. If you created the timer as a manual-reset timer, it remains signaled until you call SetWaitableTimer again. If you created it as an auto-reset timer, it resets as soon as a thread successfully waits on it.

As the name suggests, you can cancel a waitable timer using the CancelWaitableTimer function. Once a manual-reset timer is signaled, there's no need to cancel it; you can simply close its handle.

As I mentioned, a waitable timer can optionally queue an APC routine. You won't likely use this facility much because you can always just wait on the timer to be signaled, then execute whatever code you want. In the event that you do decide to use an APC routine, keep in mind that it's pointless for a thread to wait on a timer's handle and wait on the timer alertably at the same time. Once the timer becomes signaled, the thread wakes (which takes it out of the alertable state), causing the APC routine not to be called.

If you've built many apps for Windows, you're probably familiar with Windows' user timer object. This is a different beast than a kernel mode waitable timer. The biggest difference between them is that user timers require a user interface harness in your app, making them relatively resource consumptive. Also, as with the other kernel mode objects we've been discussing, waitable timers can be shared by multiple threads and can be secured.

Windows' user timer object generates WM_TIMER messages that come back either to the thread that called SetTimer or to the thread that created the window. This means that only one thread is notified when the timer goes off. Conversely, a waitable timer can signal multiple threads at once, even across processes.

The decision of whether to use a waitable timer object or a user timer should come down to whether you're doing very much user interface manipulation in response to the timer. If so, a user timer is probably a better choice since you will have to wait on both the object and window messages that might occur if you use a waitable timer. If you end up using a waitable timer in a GUI app and need the thread that's waiting on the timer to respond to messages while it waits, you can use MsgWaitForMultipleObjects to wait on the object and messages simultaneously.

Semaphores

Typically, a kernel semaphore object is used to limit the number of threads that may access a resource at once. While a mutex, by definition, allows just one thread at a time to access a protected resource, a semaphore can be used to allow multiple threads to access a resource simultaneously and to set the maximum number of simultaneous accessors. You specify the maximum number of simultaneous accessors when you create the semaphore, and Windows ensures that this limit is enforced.

When you first create a semaphore object, you specify not only the maximum value for the semaphore but also its starting value. As long as its value remains greater than 0, the semaphore is signaled, and any thread that attempts to wait on it will return immediately, decrementing the semaphore as it returns. Say, for example, that you want a maximum of five threads (out of a pool of ten) to access a particular resource simultaneously. You would create the semaphore with a maximum value of 5, then as each thread needed access to the resource, it would call one of the wait functions to wait on the semaphore. The first five would return immediately from the wait function, and each successful wait would decrement the semaphore's value by 1. When the sixth thread began waiting on the semaphore, it would be blocked until one of the first five released the semaphore. If, say, Thread 5 then called ReleaseSemaphore, Thread 6 would return immediately from its wait state. All the while, the number of threads with simultaneous access to the resource would never exceed five.

Mutexes

Mutexes are among the most useful and widely used of the Windows kernel objects. They have many practical uses—from serializing access to critical resources to making code thread-safe—and they are often found in abundance in sophisticated multithreaded Windows apps.

Mutexes are very similar to critical sections except, of course, that they're kernel objects. This means that accessing them can be slower, but they are generally more functional than critical sections because they can be shared across processes and because you can use the wait functions to wait on them with a specified timeout.

Mutexes are unusual in that they are the only kernel object that supports the notion of thread ownership. Each mutex kernel object has a field that contains the thread ID of the owning thread. If the thread ID is 0, the mutex is not owned and is signaled. If the thread ID is other than 0, a thread owns the mutex, and the mutex is unsignaled.

Unlike all other kernel objects, a thread that owns a mutex can wait on it multiple times without releasing it and without waiting. Each time a thread successfully waits on a mutex, it becomes its owner (its thread ID is stored in the mutex's kernel object), and the object's recursion counter is incremented. This means that you must release the mutex (using ReleaseMutex) the same number of times that you waited on it in order for it to become signaled, thus allowing other threads to take ownership of it.

Windows also makes sure that a mutex isn't abandoned by a terminated thread, potentially blocking other threads infinitely. If a thread that owns a mutex terminates without releasing it, Windows automatically resets the mutex object's thread ID and recursion counter to 0 (which signals the object). If another thread is waiting on the mutex, the system gives it ownership of the mutex by setting the mutex's thread ID to reference the waiting thread and sets its recursion counter to 0. The wait function itself will return WAIT_ ABANDONED so that the waiter can determine how it came to own the mutex.

As I've mentioned, suspending or terminating or a thread can cause a mutex to be held indefinitely or even abandoned. It's always better to let a thread's entry-point function return normally when possible.

I/O Completion Ports

An I/O completion port allows multiple threads conducting asynchronous I/O operations to be synchronized through a single object. You associate file handles with an I/O completion port through the Win32 API function CreateIOCompletionPort. When an asynchronous I/O operation that was started on a file associated with an I/O completion port finishes, an I/O completion packet is queued to the port.

A thread can wait on the port by calling the GetQueuedCompletionStatus function. If no I/O completion packet is ready, the thread will go into a wait state. When a packet is queued to the port, the function will return immediately.

Note that you can use I/O completion ports to synchronize operations besides those involving asynchronous I/O. Using the PostQueuedCompletionStatus API function in tandem with GetQueuedCompletionStatus, you can create a multithread signaling mechanism that is more scalable than the SetEvent/WaitForMultipleObjects approach. This is due to the fact that WaitForMultipleObjects is limited to waiting on a maximum of 64 worker threads. Using an I/O completion port, you can create a synchronization system that can wait on as many threads as the process can create. Here's an example of how you could implement a mechanism that could wait on more than 64 threads simultaneously.

  1. The main thread of your process creates an I/O completion port that is not associated with a particular file or files by passing NULL for its FileHandle parameter.

  2. The main thread creates as many threads as your process requires—let's say it starts with a pool of 100 worker threads.

  3. Once all the threads are created, the main thread calls GetQueuedCompletionStatus to wait on the I/O completion port.

  4. Whenever a worker thread has finished its work and wants to signal the main thread, it calls PostQueuedCompletionStatus to post an I/O completion packet to the port. For example, it might do this before returning from its entry-point function or before going to sleep while it waits on, say, a global event associated with the main thread. In order to let the main thread know which thread completed its work, the worker thread could pass its thread ID into PostQueuedCompletionStatus.

  5. The main thread returns from its call to GetQueuedCompletionStatus when it sees the packet.

  6. Because the main thread knows how many threads it created, it calls GetQueuedCompletionStatus again, looping repeatedly until all the worker threads have indicated that they have completed their work.

Because a thread need not terminate to be signaled and because the main thread (or any thread) can wait on as many other threads as it wants, an I/O completion port provides a nicely scalable alternative to waiting on multiple threads using WaitForMultipleObjects or a similar API call.

Exercises

In this next exercise, you'll experiment with an application that is intentionally not thread-safe. You'll get to see firsthand what happens when an application does not synchronize access to shared resources. This should give you a greater appreciation for the lengths SQL Server must go to in order to ensure that its worker threads can access shared resources in a manner that is both safe and fast.

In the final exercise in this chapter, you'll learn to implement a spinlock based on a kernel mutex object. SQL Server uses spinlocks to guard access to shared resources within the server; understanding how they work will give you greater insight into how SQL Server works.

Exercise 3.9 What Happens When Threads Aren't Synchronized?

The following C++ application demonstrates three methods for accessing a shared resource (in this case, a global variable) from multiple resources. You can find it in the CH03 hread_sync subfolder on the CD accompanying this book. Load the application into the Visual C++ development environment and compile and run it in order to work through the exercise.

The app creates 50 threads that check the value of a global variable and, if it's less than 50, increment it. The end result of this should be a value of 50 in the global variable once all the threads have terminated, but that's not always the case. Let's looks at the code (Listing 3.5).

Listing 3.5. Thread Synchronization Options
// thread_sync.cpp : Defines the entry point for the
// console application.
//

#include "stdafx.h"
#include "windows.h"

#define MAXTHREADS 50
//#define THREADSAFE
//#define CRITSEC
//#define MUTEX

long g_ifoo=0;
HANDLE g_hStartEvent;

#ifdef CRITSEC
CRITICAL_SECTION g_cs;
#endif

#ifdef MUTEX
HANDLE hMutex;
#endif

DWORD WINAPI StartThrd(LPVOID lpParameter)
{
  WaitForSingleObject(g_hStartEvent, INFINITE);
#ifdef CRITSEC
  EnterCriticalSection(&g_cs);
#endif
#ifdef MUTEX
  WaitForSingleObject(hMutex,INFINITE);
#endif
#ifdef THREADSAFE
  if (g_ifoo<50) InterlockedIncrement(&g_ifoo);
#else
  if (g_ifoo<50) g_ifoo++;
#endif
#ifdef CRITSEC
  LeaveCriticalSection(&g_cs);
#endif
#ifdef MUTEX
  ReleaseMutex(hMutex);
#endif
  return 0;
}

int main(int argc, char* argv[])
{
  DWORD dwThreadID;
  HANDLE hThreads[MAXTHREADS];
#ifdef CRITSEC
  InitializeCriticalSection(&g_cs);
#endif
#ifdef MUTEX
  hMutex=CreateMutex(NULL,FALSE,NULL);
#endif
  g_hStartEvent=CreateEvent(NULL,true,false,NULL);
  for (int i=0; i<MAXTHREADS; i++) {
    hThreads[i]=CreateThread(NULL,
    0,
    (LPTHREAD_START_ROUTINE)StartThrd,
    0,
    0,
    (LPDWORD)&dwThreadID);
  };
  SetEvent(g_hStartEvent);
  WaitForMultipleObjects(i,hThreads,true,INFINITE);

  printf("g_ifoo=%d
",g_ifoo);
#ifdef CRITSEC
  DeleteCriticalSection(&g_cs);
#endif
#ifdef MUTEX
  CloseHandle(hMutex);
#endif
  return 0;
}

Three #define constants control how (and whether) the program synchronizes access to the global variable. By default, THREADSAFE, CRITSEC, and MUTEX are undefined, so access to the global variable is not synchronized. If you run the program on a multiprocessor machine enough times, you will eventually see a situation where g_ifoo does not end up with a value of 50. This is because access to the variable was not synchronized, and there was an overlap between the time one thread retrieved the value and another incremented it, as illustrated by the scenario outlined in Table 3.17.

Because of the overlap, two threads set g_ifoo to the same value, causing g_ifoo to end up with a value less than 50 because there are only 50 worker threads.

If you then uncomment the //#define THREADSAFE line and recompile, this overlap should be impossible. This is because the code then uses InterlockedIncrement to ensure atomicity of the increment operation. In the scenario shown in Table 3.17, this means that steps 3, 5, and 7 are performed as a single operation, as are steps 4, 6, and 8. Since Thread 10 completes its increment operation before Thread 11 is allowed to do so, Thread 11 sees 11, not 10, as the current value of g_ifoo when it performs its increment.

You can take this a step further by commenting out the #define for THREADSAFE and uncommenting CRITSEC. Access to the global variable is then synchronized with a critical section.

You can provide the ultimate in multithread synchronization by commenting out CRITSEC and uncommenting MUTEX. The code will then use a mutex kernel object to serialize access to the global variable.

Experiment with all four techniques and see what results you get. Generally speaking, when building applications, you should choose from among them in the order in which I've presented them here: If you don't need thread synchronization, don't code for it. If you do, try to use the interlocked functions. If they don't meet your needs, perhaps a critical section will do the job. If a critical section doesn't work for you (perhaps because you need to allow for a timeout on the wait or you need to synchronize multiple processes), move up to a kernel object such as a mutex.

Table 3.17. An Example of Unsynchronized Resource Access by Multiple Threads
StepAction
1

Thread 10: Is g_ifoo < 50— Yes

2

Thread 11: Is g_ifoo < 50— Yes

3

Thread 10: Get g_ifoo's value— currently 10

4

Thread 11: Get g_ifoo's value— currently 10

5Thread 10: Increment it (to 11)
6Thread 11: Increment it (to 11)
7Thread 10: Move the new value back to g_ifoo
8Thread 11: Move the new value back to g_ifoo

Exercise 3.10 Implementing a Kernel Mode Spinlock by Using a Mutex

Earlier in the chapter, I showed an example of the traditional implementation of a spinlock—a user mode construct that uses one of the interlocked functions to ensure atomic access to the lock variable. You can also set up spinlocks that are based on kernel mode objects. That may seem like a strange thing to do, but one very natural use of a kernel spinlock is to execute other code on a thread while you wait on a kernel object. You basically code the spinlock to time out on a fairly short interval, execute whatever code you're wanting to execute while you wait, then return to the wait loop. This keeps the thread semi-busy while it waits on a resource, which you may find preferable to simply having it go to sleep until the resource is available.

The example below demonstrates a kernel object–based spinlock. You can find it in the CH03kernel_spinlock subfolder on the CD accompanying this book. Load it into the Visual C++ development environment, then compile and run it. Listing 3.6 shows the code.

Listing 3.6. A Kernel Object–Based Spinlock Implementation
// kernel_spinlock.cpp : Defines the entry point for the
// console application.
//

#include "stdafx.h"
#include "windows.h"

#define MAXTHREADS 2
#define SPINWAIT 1000

HANDLE g_hWorkEvent;

class CSpinLock {
public:
  static void GetLock(HANDLE hEvent);
};

void CSpinLock::GetLock(HANDLE hEvent) {
  int i=0;
  while (WAIT_TIMEOUT==WaitForSingleObject(hEvent, SPINWAIT)) {
    printf("Spinning count=%d for thread 0x%08x
",++i,
        GetCurrentThreadId());
    //Put other code here to execute while we wait on the resource
  }
}

DWORD WINAPI StartThrd(LPVOID lpParameter)
{
  printf("Inside thread function for thread 0x%08x
",
      GetCurrentThreadId());
  CSpinLock::GetLock(g_hWorkEvent);
  printf("Acquired spinlock for thread 0x%08x
",
      GetCurrentThreadId());
  Sleep(5000);
  SetEvent(g_hWorkEvent);
  return 0;
}

int main(int argc, char* argv[])
{
  DWORD dwThreadID;
  HANDLE hThreads[MAXTHREADS];

  g_hWorkEvent=CreateEvent(NULL,false,true,NULL);

  for (int i=0; i<MAXTHREADS; i++) {
    hThreads[i]=CreateThread(NULL,
    0,
    (LPTHREAD_START_ROUTINE)StartThrd,
    0,
    0,
    (LPDWORD)&dwThreadID);
  };
  WaitForMultipleObjects(i,hThreads,true,INFINITE);

  return 0;
}

In this example, the spinlock is implemented in its own class and exposed via a static method. (The static method allows you to avoid having to create an instance of the CSpinLock class in order to use it.) You could pass any kernel object into the spinlock; in this example we use an event object.

The main function creates two threads that acquire the spinlock, sleep for five seconds, then release the spinlock by signaling the event. Since only one of the threads can acquire the spinlock at a time, the second thread to start waits on the first one to complete by spinning for as many one-second durations as it takes until the spinlock is released (i.e., the event is signaled).

Since the event is an auto-reset event, it immediately resets to unsignaled once a thread successfully waits on it. So, when GetLock successfully waits on the event object for the first thread, the event is immediately set back to unsignaled, and the second thread must wait for the first thread to signal it before proceeding. Given that the thread function sleeps for five seconds, this will be at least five seconds after the first thread acquires the spinlock.

Compile and run the code, experimenting with different sleep times for the thread function and different numbers of threads. You should see output like the following when you run the application:

Inside thread function for thread 0x00000d00
Acquired spinlock for thread 0x00000d00
Inside thread function for thread 0x00000f20
Spinning count=1 for thread 0x00000f20
Spinning count=2 for thread 0x00000f20
Spinning count=3 for thread 0x00000f20
Spinning count=4 for thread 0x00000f20
Spinning count=5 for thread 0x00000f20
Acquired spinlock for thread 0x00000f20

Thread Synchronization Recap

Windows provides a rich suite of thread synchronization functions. Thread synchronization comes in two basic varieties: user mode synchronization and kernel mode synchronization. User mode synchronization usually involves spinlocks, critical sections, and interlocked functions. Kernel mode synchronization involves kernel objects such as mutexes, semaphores, events, threads, processes, and waitable timers.

SQL Server makes use of both types of synchronization. Its UMS component spends a fair amount of time waiting on kernel mode synchronization objects, but it also implements a variety of spinlocks and does its best to avoid switching a thread into kernel mode unless absolutely necessary.

The key to successful thread synchronization is ensuring atomic access to resources. Modifying the same resource from multiple threads simultaneously is a recipe for disaster. Effective thread synchronization prevents this.

Thread Synchronization Knowledge Measure

  1. True or false: The single most important element of thread synchronization is ensuring atomic access to resources.

  2. Give two examples of user mode synchronization objects or constructs.

  3. What happens when a thread successfully waits on an auto-reset event?

  4. What happens when a thread successfully waits on a semaphore?

  5. What happens when a thread successfully waits on a mutex?

  6. What is the only kernel object that supports the concept of a thread owner?

  7. If you want to protect a routine in a DLL shared by several processes from being executed by more than one process at a time, what type of synchronization object should you use?

  8. If you are building a windows GUI and want to update the GUI at certain regular intervals, should you use a waitable timer object or a Windows user timer?

  9. What API function is used to set the signal frequency for a waitable timer object?

  10. True or false: A spinlock is a kernel mode object that spins (loops) until it acquires a lock on a resource.

  11. Explain the function of the InterlockedExchange API function.

  12. True or false: You cannot specify a timeout value when waiting on a critical section object.

  13. What action does the system take when a thread that owns a mutex terminates?

  14. What is the maximum number of objects that WaitForMultipleObjects can wait on simultaneously?

  15. What type of message does a Windows user timer object produce?

  16. True or false: Generally speaking, you should avoid techniques and design elements that continuously poll for resource availability.

  17. True or false: Windows detects thread deadlocks, selects one of the participating threads as the deadlock victim, and terminates it.

  18. What API routine does a thread use to acquire a critical section object?

  19. What API routine does a thread use to release a critical section object?

  20. Name a mechanism discussed in this chapter for waiting on more objects than the maximum supported by WaitForMultipleObjects.

  21. True or false: A spinlock consumes no CPU resources while it waits.

  22. True or false: A process object is the only type of kernel object that cannot be signaled.

  23. True or false: The order in which you access kernel resources has no bearing on thread deadlocks because they are kernel resources.

  24. True or false: Synchronizing threads using kernel mode objects is generally faster than synchronizing them via user objects.

  25. When a semaphore's value reaches 0, is it signaled or unsignaled?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.46.229