Applications are sometimes multithreaded. Multithreaded applications are much more susceptible to bugs, including synchronization problems. Each thread represents a separate path of execution and owns resources, such as stack, thread local storage, local variables, and thread environment blocks. Proper use of threads can enhance the performance of an application, whereas poor implementation can hinder performance.
Creating threads is not difficult. Managing threads is the real challenge—at times, it can be like herding cats. Thread synchronization, which is the management of threads, can be complicated, including preventing race conditions and controlling access to shared resources.
Improperly implemented threads can lead to a high-utilization or low-utilization condition. High utilization is characterized by one or more threads consuming at or near 100-percent CPU cycles. Other threads are starved for time, which makes the application appear to hang or behave incorrectly. Low utilization is the reverse; a process and threads are receiving minimal or no CPU cycles.
There are two primary reasons for high utilization:
Threads are in tight loops that usurp all CPU resources.
Active high-priority threads prevent lower-priority threads from receiving attention from the CPU.
There are several reasons for low utilization:
Threads are waiting for resources that never become available. Threads in that state are suspended indefinitely and receive little CPU time.
Threads are mutually blocked on each other. This deadlock suspends both threads indefinitely, and both receive little CPU time. This often is caused by nested synchronization.
A low-priority thread in a sea of high-priority threads has little opportunity to sail. It is given little CPU time.
Threads with erroneously high suspend counts are not resumed when planned. For this reason, the thread continues to receive no CPU time.
High and low utilization are not the only issues with multithread applications. The list of potential transgressions from multithreading is almost endless. Threading is fertile ground for debugging.
For brevity, this section focuses on synchronization problems related to monitors and mutexes. Synchronization problems from semaphores, events, and reader-writer locks are not discussed. Monitors synchronize access to a single resource. The resource could be an object, data structure, or even an algorithm—anything that requires exclusive access. Monitors provide synchronization within a process. Like a monitor, a mutex synchronizes access to a single resource. However, mutexes can synchronize across processes.
Monitors are the most frequently used synchronization device. For that reason, the CLR tracks monitors for efficient access and synchronization. Instances of managed objects have an additional field called the syncblock index. This is an index into the syncblock table where monitors are tracked. The syncblock index of an object defaults to zero. When an object is used with a monitor, the syncblock index is updated to point to an entry in the syncblock table. Another thread attempting to claim the same resource with a monitor would notice the entry in the syncblock table and be blocked. The command !syncblk –all lists the outstanding syncblocks. Thin locks are used when possible, where locking information is cached in the object header. However, the header is of a limited size and the lock might need to be inflated, which is moving the lock information from the header into the syncblock table.
Markers of thread synchronization, both monitors and mutexes, can be found on the call stack. Finding AwareLock.Enter, WaitHandle.WaitAll, and WaitForMultipleObjects in the call stack are indications of thread synchronization activity, as explained in the following list:
AwareLock.Enter. This method is called when an object is bound to a monitor. Set a breakpoint on this symbol, as shown here:
bp mscorwks!AwareLock::Enter
If hit, the current thread is entering a monitor.
WaitHandle.WaitAll. Except for monitors, most synchronization objects in .NET are derived from the WaitHandle class. Look for method calls from this class on the managed call stack, including WaitOne and WaitAll, as a sign of pending synchronization.
WaitForMultipleObjects. Most waits on synchronization objects, managed or unmanaged, devolve to the WaitForMultipleObjects API, which is the workhorse of thread synchronization. This is another marker that can be found on the call stack. Here is the syntax of the WaitForMultipleObjects API:
DWORD WaitForMultipleObjects(DWORD nCount, const HANDLE* lphandles, BOOL bWaitAll, DWORD dwMilliseconds)
The first step to debugging threads in WinDbg and SOS is using a thread command. In WinDbg, the tilde (~) is the thread command; in SOS, the thread command is !threads. However, the ~ command lists only managed threads.
Here is sample output from the WinDbg thread command:
0:000> ~ . 0 Id: f7c.f80 Suspend: 1 Teb: 7ffdd000 Unfrozen 1 Id: f7c.f9c Suspend: 1 Teb: 7ffdc000 Unfrozen 2 Id: f7c.fa0 Suspend: 1 Teb: 7ffdb000 Unfrozen 3 Id: f7c.fa4 Suspend: 1 Teb: 7ffda000 Unfrozen 4 Id: f7c.de0 Suspend: 1 Teb: 7ffd9000 Unfrozen
Listed in order, the columns represent the thread number, process identifier, thread identifier, suspend count, address of thread environment block, and status of the thread.
Here is output from the SOS thread command:
0:000> !threads ThreadCount: 4 UnstartedThread: 1 BackgroundThread: 1 PendingThread: 0 DeadThread: 0 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 0 1 f80 001501f8 6020 Disabled 013c2cf0:013c32bc 001483a8 0 STA 2 2 fa0 00153e40 b220 Enabled 00000000:00000000 001483a8 0 MTA (Finalizer) 4 3 de0 0018b898 b020 Disabled 013b4cdc:013b52bc 001483a8 2 MTA XXXX 4 0 0018e750 9400 Enabled 00000000:00000000 001483a8 0 Ukn
The first column, which is untitled, is the thread identifier assigned by the debugger. Unknown threads are marked as "xxxx". The remaining columns are as follows:
ID. Unmanaged thread number
OSID. Managed thread number
ThreadObj. Address of the related thread object
State. State of the thread
Preemptive GC. Whether a thread can be preempted for garbage collection
Domain. Address of the AppDomain that hosts the thread
Lock Count. Lock count
APT. Apartment model
Multithread capabilities have been added to the Store application in the Threading subfolder. There are two additional buttons. The Enumerate button writes the current transactions to the forward.txt file. The Reverse Enumerate button writes the transactions, in reverse order, to the reverse.txt file. Each button handler starts a thread to accomplish the task.
Close all instances of the Store application.
Start the Store program in the Threading subfolder and add three transactions. Click Enumerate to write the transactions to the forward.txt file. Close the Store program and open the forward.txt file, which is empty. It should contain three transactions—so that’s a problem!
Try again. Reopen the Store application and add three transactions. However, when you click Enumerate, an unhandled exception occurs. What is the problem? You need to investigate.
Start the Store application yet again. Get the process identifier using the Tlist command. You should notice that there are two Store applications running. Apparently a previous version is still running in the background. This kind of problem is typical of a hung thread, which keeps an application alive even after the user closes the main window.
Use ADPlus to obtain a dump from the first Store application in the list. Here is the command:
adplus -hang -o c:dumps -p 2696
Open the dump file in WinDbg.
Load the SOS debugging extension and list the managed threads. For readability, some of the columns have been removed from this listing:
0:000> .load sos 0:000> !threads ThreadCount: 3 UnstartedThread: 0 BackgroundThread: 2 PendingThread: 0 DeadThread: 0 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count 0 1 a90 001501f8 2016220 Enabled 013dbe2c:013dc990 001483a8 0 2 2 9c8 00153e40 b220 Enabled 00000000:00000000 001483a8 0 4 3 3e0 00190718 b020 Disabled 013d63b0:013d6990 001483a8 2
Thread 4 is the only thread with a positive lock count, so it appears to be the culprit. What is the thread waiting for? This question is answered with the !dumpheap –thinlock command:
0:000> !dumpheap -thinlock
Address MT Size
01392440 00d443e8 24 ThinLock owner 3 (00199a78) Recursive 0
01392468 790fa098 12 ThinLock owner 3 (00199a78) Recursive 0
From the preceding listing, both thin locks are owned by Thread 4. This is why the lock count for that thread is 2. The address of the Thread 4 object is 0x00199a78, as the thread list shows. The MT column is the method table of the object that is being waited on. Dump both method tables to find what Thread 4 is waiting for:
0:000> !dumpmt 00d443e8 EEClass: 79126bb0 Module: 790c2000 Name: System.Collections.Generic.List`1[[Store.Item, Store]] mdToken: 02000287 (C:WINDOWSassemblyGAC_32mscorlib2.0.0.0__b77a5c561934e089msco rlib.dll) BaseSize: 0x18 ComponentSize: 0x0 Number of IFaces in IFaceMap: 6 Slots in VTable: 30 0:000> !dumpmt 790fa098 EEClass: 790fa034 Module: 790c2000 Name: System.Object mdToken: 02000002 (C:WINDOWSassemblyGAC_32mscorlib2.0.0.0__b77a5c561934e089msco rlib.dll) BaseSize: 0xc ComponentSize: 0x0 Number of IFaces in IFaceMap: 0 Slots in VTable: 14
What is known? When the program hung, Thread 4 had outstanding locks on a Store.Item instance and an Object instance. No other thread is holding a lock, which narrows the problem to Thread 4. Here is the source code for Thread 4:
private void Forward() { lock (items) { lock (syncObj) { StreamWriter sw = new StreamWriter("forward.txt"); IEnumerator<Item> enumerator = items.GetEnumerator(); while (true) { } while (true) { if (enumerator.MoveNext()) { Item current = enumerator.Current; string message = current.ItemId + " Product Mask: " + ((int)current.Products).ToString(); sw.WriteLine(message); } else { break; } } sw.Close(); } } }
Review the code—I hope that the problem is obvious. The lock statement is a shortcut to calling the Monitor.Enter method. After acquiring the locks, the program enters an infinite loop. For this reason, the locks are never released. Remove the extraneous while(true) loop and rebuild the application (after killing any hung instances of the process that remain in memory). Run the new application and create a few transactions. The program should operate correctly. The forward.txt file is created and contains transactions.
The Store application appears to be working, but not in all circumstances. Delete any forward.txt or reverse.txt files that have been created. Then step through the following procedure:
Close all instances of the Store application that are running. Now run the Store application in the Threading subfolder and add a couple of transactions. Click Reverse Enumerate, and then click Enumerate. Close the application and check for the forward.txt and reverse.txt files. Neither file has been created. Why not?
Restart the Store application. Add two new transactions. Again click Reverse Enumerate, and then click Enumerate.
Start WinDbg and attach to the Store application.
List the available threads with the !threads command. Threads 4 and 5 have a positive lock count, so both are waiting for something:
0:006> !threads
ThreadCount: 4
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count
0 1 e54 001501f8 6020 Enabled 013e7afc:013e7b68 001483a8 0
2 2 d84 00153e40 b220 Enabled 00000000:00000000 001483a8 0
4 3 2e0 00191f60 200b020 Enabled 013d9bec:013dbb68 001483a8 1
5 4 ccc 00190b98 200b020 Enabled 013dfb9c:013e1b68 001483a8 1
The !syncblk command reports on syncblock entries. In the Store application, Thread 5 is waiting for a Store.Item instance, while Thread 4 is waiting for an Object instance. The fact that no other threads are listed with the !syncblk command reaffirms that the problem is probably in Threads 4 and 5:
0:006> !syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner 2 00174854 3 1 00190b98 ccc 5 01392440 System. Collections.Generic.List`1[[Store.Item, Store]] 21 001747f4 3 1 00191f60 2e0 4 01392468 System.Object ----------------------------- Total 44 CCW 0 RCW 0 ComClassFactory 0 Free 0
Perform a stack trace on Thread 4. Before displaying the call stack, confirm that the active thread is Thread 4. If not, switch to that thread (~4s). Notice that the AwareLock::Enter method is found on the call stack, which is a marker of thread synchronization. The WaitForMultipleObjects method is near the top of the call stack. This means the threads likely are being blocked by something:
0:004> kb ChildEBP RetAddr Args to Child 0358f4a0 7c90e9ab 7c8094f2 00000001 0358f4cc ntdll!KiFastSystemCallRet 0358f4a4 7c8094f2 00000001 0358f4cc 00000001 ntdll!ZwWaitForMultipleObjects+0xc 0358f540 79f4aa60 00000001 00174868 00000000 KERNEL32!WaitForMultipleObjectsEx+0x12c 0358f5a8 79f16d92 00000001 00174868 00000000 mscorwks!WaitForMultipleObjectsEx_SO_TOL ERANT+0x6f 0358f5c8 79f16d03 00000001 00174868 00000000 mscorwks!Thread::DoAppropriateAptStateWa it+0x3c 0358f64c 79f16b9e 00000001 00174868 00000000 mscorwks!Thread::DoAppropriateWaitWorker +0x144 0358f69c 79f4a9d9 00000001 00174868 00000000 mscorwks!Thread::DoAppropriateWait+0x40 0358f6f8 79ebc06e ffffffff 00000001 00000000 mscorwks!CLREvent::WaitEx+0xf7 0358f708 7a0fd093 ffffffff 00000001 00000000 mscorwks!CLREvent::Wait+0x17 0358f794 7a0fd28f 00191f60 ffffffff 00191f60 mscorwks!AwareLock::EnterEpilog+0x94 0358f7b0 79f0fe6a 8a20e59b 0358f888 01392214 mscorwks!AwareLock::Enter+0x61
Look at the parameters for WaitForMultiple objects. The first parameter is 1 and indicates that Thread 4 is waiting for a single synchronization object. The second parameter is a pointer to an array of handles. Each handle represents a synchronization object. Because Thread 4 is waiting for one object, there should be only one handle in the array. Display the memory at the address to view that handle:
0:004> dd 00174868 00174868 00000658 0000000d 00000000 00000000 00174878 00000000 00000000 00000000 00000000 00174888 00000000 00000000 00000000 80000023
The handle to the synchronization object is 0x00000658. Use the !handle command to obtain more information on that handle. It is an event handle. We have now discovered that Thread 4 is waiting for an event object:
0:004> !handle 00000658
Handle 658
Type Event
Repeat Threads Example #2 and Threads Example #2 for Thread 5. The result should be similar.
The problem has been isolated to Threads 4 and 5. Threads 4 and 5 are run as the Forward and Reverse methods, respectively. Here is the source code for these methods. Both methods have nested locks. However, the locks in the Forward and Reverse methods are in the opposite order, which is causing a deadlock:
private void Forward() { lock (items) { lock (syncObj) { StreamWriter sw = new StreamWriter("forward.txt"); IEnumerator<Item> enumerator = items.GetEnumerator(); while (true) { // other code } sw.Close(); } } } Object syncObj = new Object(); private void Reverse() { lock (syncObj) { Thread.Sleep(5000); lock (items) { StreamWriter sw = new StreamWriter("reverse.txt"); items.Reverse(); IEnumerator<Item> enumerator = items.GetEnumerator(); while (true) { // other code } items.Reverse(); sw.Close(); } } }
Swap the order of the locks in the Reverse method and the problem is solved. Restart the Store application and test this.
18.220.96.214