The UMS Scheduler Lists

Each UMS scheduler maintains five lists that support the work of scheduling threads: a worker list, a runnable list, a waiter list, an I/O list, and a timer list. Each of these plays a different role, and nodes are frequently moved between lists.

The Worker List

The worker list is the list of available UMS workers. A UMS worker is an abstraction of the thread/fiber concept and allows either to be used without the rest of the code being aware of which is actually being used under the covers. As I said, one of the design goals of UMS was to provide support for fibers in such a way as not to require the core engine code to be concerned with whether the system was using fibers or threads. A UMS worker encapsulates a thread or fiber that will carry out tasks within the server and abstracts it such that the server does not have to be concerned (for the most part) with whether it is in thread mode or fiber mode. Throughout this chapter, I'll refer to UMS workers instead of threads or fibers.

If your SQL Server is in thread mode (the default), a UMS worker encapsulates a Windows thread object. If your server is in fiber mode, a UMS worker encapsulates a Windows fiber, the handling of which is actually implemented outside of the Windows kernel, as I mentioned in Chapter 3.

The Connection Process

When a client connects to SQL Server, it is assigned to a specific UMS scheduler. The selection heuristics are very basic: Whichever scheduler has the fewest number of associated connections gets the new connection. Once a connection is associated with a scheduler, it never leaves that scheduler.

Regardless of whether its associated scheduler is busy and there are inactive schedulers on the system, UMS will not move a spid between schedulers. This means that it's possible to design scenarios where SQL Server's support for symmetric multiprocessing is effectively thwarted because an application opens multiple persistent connections that do not perform a similar amount of work.

Say, for example, that you have a two-processor machine, and a SQL Server client application opens four persistent connections into the server, with two of those connections performing 90% of the work of the application. If those two connections end up on the same scheduler, you may see one CPU consistently pegged while the other remains relatively idle. The solution in this situation is to balance the load evenly across the connections and not to keep persistent connections when the workload is unbalanced. Disconnecting and reconnecting is the only way to move a spid from one scheduler to another. (This movement isn't guaranteed—a spid that disconnects and reconnects may end up on the same scheduler depending on the number of users on the other schedulers.)

Once a spid is assigned to a scheduler, what happens next depends on the status of the worker list and whether SQL Server's max worker threads configuration value has been reached. If a worker is available in the worker list, it picks up the connection request and processes it. If no worker is available and the max worker threads threshold has not been reached, a new worker is created, and it processes the request. If no workers are available and max worker threads has been reached, the connection request is placed on the waiter list and will be processed in FIFO order as workers become available.

Client connections are treated within UMS as logical (rather than physical) users. It is normal and desirable for there to be a high ratio of logical users to UMS workers. As I've mentioned before, this is what allows a SQL Server with a max worker threads setting of 255 to service hundreds or even thousands of users.

Work Requests

UMS processes work requests atomically. This means that a worker processes an entire work request—a T-SQL batch execution, for example—before it is considered idle. It also means that there's no concept of context switching during the execution of a work request within UMS. While executing a given T-SQL batch, for example, a worker will not be switched away to process a different batch. The only time a worker will begin processing another work request is when it has completed its current work request. It may yield and execute, for example, I/O completion routines originally queued by another worker, but it is not considered idle until it has processed its complete work request, and it will not process another work request until it is finished with the current work request. Once that happens, the worker either activates another worker and returns itself to the worker list or enters an idle loop code line if there are no other runnable workers and no remaining work requests, as we'll discuss in just a moment.

This atomicity is the reason it's possible to drive up the worker thread count within SQL Server by simply executing a number of simultaneous WAITFOR queries as we did using the STRESS.CMD tool in Chapter 3. While each WAITFOR query runs, the worker that is servicing it is considered busy by SQL Server, so any new requests that come into the server require a different worker. If enough of these types of queries are initiated, max worker threads can be quickly reached, and, once that happens, no new connections will be accepted until a worker is freed up.

When the server is in thread mode and a worker has been idle for 15 minutes, SQL Server destroys it, provided doing so will not reduce the number of workers below a predefined threshold. This frees the virtual memory associated with an idle worker's thread stack (.5MB) and allows that virtual memory space to be used elsewhere in the server.

The Runnable List

The runnable list is the list of UMS workers ready to execute an existing work request. Each worker on this list remains in an infinite wait state until its event object is signaled. Being on the runnable list does not imply that the worker is schedulable by Windows. It will be scheduled by Windows as soon as its event object is signaled according to the algorithms within UMS.

Given that UMS implements a cooperative scheduler, you may be wondering who is actually responsible for signaling the event of a worker on the runnable list so that it can run. The answer is that it can be any UMS worker. There are calls throughout the SQL Server code base to yield control to UMS so that a given operation does not monopolize its host scheduler. UMS provides multiple types of yield functions that workers can call. As I've mentioned, in a cooperative tasking environment, threads must voluntarily yield to one another in order for the system to run smoothly. SQL Server is designed so that it yields as often as necessary and in the appropriate places to keep the system humming along.

When a UMS worker yields—either because it has finished the task at hand (e.g., processing a T-SQL batch or executing an RPC) or because it has executed code with an explicit call to one of the UMS yield functions—it is responsible for checking the scheduler's runnable list for a ready worker and signaling that worker's event so that it can run. The yield routine itself makes this check. So, in the process of calling one of the UMS yield functions, a worker actually performs UMS's work for it—there's no thread set aside within the scheduler for managing it. If there were, that thread would have to be scheduled by Windows each time something needed to happen in the scheduler. We'd likely be no better off than we were with Windows handling all of the scheduling. In fact, we might even be worse off due to contention for the scheduler thread and because of the additional overhead of the UMS code. By allowing any worker to handle the maintenance of the scheduler, we allow the thread already running on the processor to continue running as long as there is work for it to do—a fundamental design requirement for a scheduling mechanism intended to minimize context switches. As I said in Chapter 5 in the discussion of the I/O completion port–based scheduler that we built, a scheduler intended to minimize context switching must decouple the work queue from the workers that carry it out. In an ideal situation, any thread can process any work request. This allows a thread that is already scheduled by the operating system to remain scheduled and continue running as long as there is work for it to do. It eliminates the wastefulness of scheduling another thread to do work the thread that's already running could do.

The Waiter List

The waiter list maintains a list of workers waiting on a resource. When a UMS worker requests a resource owned by another worker, it puts itself on the waiter list for the resource and for its scheduler and enters an infinite wait state for its associated event object. When the worker that owns the resource is ready to release it, it is responsible for scanning the list of workers waiting on the resource and moving them to the runnable list as appropriate. And when it hits a yield point, it is responsible for setting the event of the first worker on the runnable list so that the worker can run. This means that when a worker frees up a resource, it may well undertake the entirety of the task of moving those workers that were waiting on the resource from the waiter list to the runnable list and signaling one of them to run.

The I/O List

The I/O list maintains a list of outstanding asynchronous I/O requests. These requests are encapsulated in UMS I/O request objects. When SQL Server initiates a UMS I/O request, UMS goes down one of two code paths, depending on which version of Windows it's running on. If running on Windows 9x or Windows ME, it initiates a synchronous I/O operation (Windows 9x and ME do not support asynchronous file I/O). If running on the Windows NT family, it initiates an asynchronous I/O operation.

You'll recall from our discussion of Windows I/O earlier in the book that when a thread wants to perform an I/O operation asynchronously, it supplies an OVERLAPPED structure to the ReadFile/ReadFileEx or WriteFile/WriteFileEx function calls. Initially, Windows sets the Internal member of this structure to STATUS_PENDING to indicate that the operation is in progress. As long as the operation continues, the Win32 API HasOverlappedIoCompleted will return false. (HasOverlappedIoCompleted is actually a macro that simply checks OVERLAPPED.Internal to see whether it is still set to STATUS_PENDING.)

In order to initiate an asynchronous I/O request via UMS, SQL Server instantiates a UMS I/O request object and passes it into a method that's semantically similar to ReadFile/ReadFileScatter or WriteFile/WriteFileGather, depending on whether it's doing a read or a write and depending on whether it's doing scatter-gather I/O. A UMS I/O request is a structure that encapsulates an asynchronous I/O request and contains, as one of its members, an OVERLAPPED structure. The UMS asynchronous I/O method called by the server passes this OVERLAPPED structure into the appropriate Win32 asynchronous I/O function (e.g., ReadFile) for use with the asynchronous operation. The UMS I/O request structure is then put on the I/O list for the host scheduler.

Once an IO request is added to the IO list, it is the job of any worker that yields to check this list to see whether asynchronous I/O operations have completed. To do this, it simply walks the I/O list and calls HasOverlappedIoCompleted for each one, passing the I/O request's OVERLAPPED member into the macro. When it finds a request that has completed, it removes it from the I/O list, then calls its I/O completion routine. This I/O completion routine was specified when the UMS I/O request was originally created.

You'll recall from our discussion of asynchronous I/O that when an asynchronous operation completes, Windows can optionally queue an I/O completion APC to the original calling thread. As I said earlier, one of the design goals of UMS was to provide much of the same scheduling and asynchronous I/O functionality found in the OS kernel without requiring a switch into kernel mode. UMS's support for I/O completion routines is another example of this design philosophy. A big difference between the way Windows executes I/O completion routines and the way UMS does is that, in UMS, the I/O completion routine executes within the context of whatever worker is attempting to yield (and, therefore, checking the I/O list for completed I/O operations) rather than always in the context of the thread that originally initiated the asynchronous operation. The benefit of this is that a context switch is not required to execute the I/O completion routine. The worker that is already running and about to yield takes care of calling it before it goes to sleep. Because of this, no interaction with the Windows kernel is necessary.

If running on Windows 9x or ME, the I/O completion routine is called immediately after the Win32 I/O API call. Since the operation is handled synchronously by the operating system, there is no reason to go on the I/O list and have perhaps another worker actually run the I/O completion routine. Given that we know the I/O has completed when we return from the Win32 API call, we can go ahead and call the I/O completion routine before returning from the UMS I/O method call. This means that, on Windows 9x/ME, the I/O completion routine is always called within the context of the worker that initiated the asynchronous I/O operation in the first place.

The Timer List

The timer list maintains a list of UMS timer requests. A timer request encapsulates a timed work request. For example, if a worker needs to wait on a resource for a specific amount of time before timing out, it is added to the timer list. When a worker yields, it checks for expired timers on the timer list after checking for completed I/O requests. If it finds an expired timer request, it removes it from the timer list and moves its associated worker to the runnable list. If the runnable list is empty when it does this—that is, if no other workers are ready to run—it also signals the worker's associated event so that it can be scheduled by Windows to run.

The Idle Loop

If, after checking for completed I/O requests and expired timers, a worker finds that the runnable list is empty, it enters a type of idle loop. It scans the timer list for the next timer expiration, then enters a WaitForSingleObject call on an event object that's associated with the scheduler itself using a timeout value equal to the next timer expiration. You'll recall from our discussion of asynchronous I/O that the Win32 OVERLAPPED structure contains an event member that can store a reference to a Windows event object. When a UMS scheduler is created, an event object is created and associated with the scheduler itself. When an asynchronous I/O request is initiated by the scheduler, this event is stored in the hEvent member of the I/O request object's OVERLAPPED structure. This causes the completion of the asynchronous I/O to signal the scheduler's event object. By waiting on this event with a timeout set to the next timer expiration, a worker is waiting for either an I/O request to complete or a timer to expire, whichever comes first. Since it's doing this via a call to WaitForSingleObject, there's no polling involved and it doesn't use any CPU resources until one of these two things occurs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.214.6