SQL Server is a very I/O intensive application: it is constantly reading data from and writing data to disk. Good I/O throughput is essential for SQL Server performance and health. Unfortunately, many SQL Server installations are I/O bound, even with modern flash-based storage.
In this chapter, I will show you how to analyze and troubleshoot disk subsystem performance issues. You will learn how SQL Server processes I/O requests internally and how to identify and detect possible bottlenecks through the entire I/O stack, on the SQL Server, OS, Virtualization and Storage levels.
Next, I will talk about checkpoint process tuning, a common source of I/O bottlenecks in busy OLTP systems.
Finally, I will cover the most common I/O-related waits you may encounter in your system.
SQL Server never works with data pages directly in database files. Every time a data page needs to be read or modified, SQL Server reads that page to memory and caches it in the buffer pool. Each page in a buffer pool is referenced by a buffer structure, sometimes simply called buffer. It includes the page’s address in the data file, a pointer to the data page in the memory, status information, and the page latching queue.
SQL Server uses latches to protect internal objects in memory preventing their corruptions when multiple threads modifying them simultaneously. The two most common types of latch are exclusive, which blocks any access to the object, and shared, which allows simultaneous reads but prevents modifications of the objects.
Conceptually, latches are similar to critical sections or mutexes in application development languages. We will talk about latches in detail in Chapter 10.
The location of data pages in a buffer pool does not represent the order in which they are stored in the database files. SQL Server, however, can efficiently locate the page in the buffer pool when needed. Every time SQL Server accesses the page there, it performs a logical read. When the page is not present in memory and needs to be read from disk, the physical read also occurred.
When data needs to be modified, SQL Server changes the pages in the buffer pool, marking them as dirty, then writes log records to the transaction log file. It saves dirty pages to the data files asynchronously in the Checkpoint and, sometimes, the Lazy Writer processes. We’ll discuss both of those processes later in this chapter and transaction logs in Chapter 11. For now, remember that data modifications require SQL Server to read data pages from disk if they have not been already cached.
Now let’s look at how SQL Server works with I/O in more detail.
As you remember from chapter 2, SQL Server uses cooperative scheduling, with multiple workers running on CPUs in a rotating fashion. The workers voluntarily yield when the short quantum expires, allowing other workers to proceed. This model requires SQL Server to use asynchronous I/O as much as possible – it is impossible for workers to wait until I/O request is completed, preventing other workers from executing.
By default, all SQL Server schedulers handle I/O requests. You can override this behavior and bind I/O to specific CPUs by setting the affinity I/O mask. In theory, this may help improve I/O throughput in very busy OLTP systems; however, I rarely find it necessary. In most cases, you’ll achieve better results by performing optimizations and reducing CPU and I/O load.
You can read about affinity I/O masking in the Microsoft documentation.
Every scheduler has a dedicated I/O queue. When a worker needs to perform an I/O operation, it creates an I/O request structure, puts it to the scheduler’s queue and finally issues an asynchronous OS API I/O call. It does not wait until the request is completed; it either continues to run, doing other things, or suspends itself, moving to the SUSPENDED queue.
When a new worker starts to run on the scheduler (switching to RUNNING state), it goes through the scheduler’s I/O queue. The I/O request structures contain enough information to check if the asynchronous OS API call has been completed, along with a pointer to callback function that the worker calls to complete the I/O request in SQL Server.
I know that this sounds complicated – please bear with me and we’ll look at the details in the next section. The key things I’d like you to remember are:
Now, as promised, let’s look at that process again, with more concrete examples of reading data pages from disk.
When SQL Server needs to access a data page, it checks if the page already exists in buffer pool. If it does not, the worker allocates the buffer for the page, protecting it with an exclusive latch. This prevents workers from accessing the page until it is read – they will be blocked, waiting for the latch to clear.
Next, the worker creates the I/O request structure, puts it in the scheduler I/O queue, and initiates an OS API read request. Then it tries to acquire another shared latch on the buffer, which is blocked by the incompatible, exclusive latch held there. The worker then suspends itself with PAGEIOLATCH wait (Figure 3-1 illustrates that state).
When another worker switches to a RUNNING state, it checks to see if any I/O requests in the scheduler’s queue have been completed. If so, the worker calls the callback function to finalize the operation: this validates that page is not corrupted and removes the exclusive latch from the buffer. The worker that submitted the I/O request can then resume and access the data page (Figure 3-2).
There are several errors that may occur during I/O requests. All of them are severe, and you need to set up alerts in the system for them.
When troubleshooting those errors, you can look for the details in your SQL Server error log (use the code from Listing 1-4) and system event log.
It is very common for SQL Server to read multiple data pages in a single I/O request. For example, it uses read-ahead logic, reading multiple data pages during scans. As result, the query may perform thousands of logical reads with just a handful of physical reads. Another example is ramp-up reads, which is when SQL Server reads a large number of pages on each I/O request, trying to fill the buffer pool quickly on startup.
SQL Server handles data writes very similarly to data reads. In most cases, those writes are done asynchronously using a scheduler’s I/O queues, as you just saw in the previous examples. Obviously, the callback function will be implemented differently in different I/O operations.
When you change some data in the database, SQL Server modifies data pages in the buffer pool, reading pages from disk if needed. It generates log records for the modifications and saves them to the transaction log. The transaction is not considered to have been committed until the log records are hardened on disk. While, technically, you can treat write-ahead logging as synchronous writes, SQL Server uses an asynchronous I/O pattern for log writes.
SQL Server writes modified data pages in user databases asynchronously during checkpoint. This process finds dirty data pages in the buffer pool and saves them to disk. It tries to minimize the number of disk requests by combining and writing adjacent modified pages together in a single I/O operation when possible.
Another SQL Server process, called lazy writer, periodically sweeps the buffer pool to remove data pages that have not been recently accessed, freeing up the memory. In normal circumstances, lazy writer skips dirty data pages; however, it may also write them to disk if there is memory pressure in the system.
There are, as always, some exceptions. For example, during a bulk import operation, SQL Server allocates a set of buffers in the buffer pool and reuses them, writing data to the database outside of checkpoint. This preserves the content of the buffer pool, so it isn’t flushed by massive data imports.
Checkpoint I/O may introduce issues on busy systems. I will talk about checkpoint tuning later in this chapter. But first, let’s take a holistic look at the entire storage subsystem.
Troubleshooting slow I/O performance in SQL Server is not an easy task. I’ve seen many heated discussions between database and infrastructure teams. Database engineers generally complain about slow disk performance, while the storage engineers analyze the metrics from SAN devices with sub-millisecond latency and insist that all issues are on the SQL Server side. Neither team is right. They usually make the same mistake: oversimplifying the storage subsystem to just a couple of components. However, the storage subsystem isn’t that simple.
Figure 3-3 shows a very high-level diagram of the network-based storage subsystem, with many details missing. (It also references some troubleshooting tools. We’ll get to those, but don’t focus on them yet.) The point here is that bad I/O performance can be caused by any component, so you need to analyze all layers in the stack.
There is also an option of using direct-attached storage (DAS). In this configuration, the storage either installed locally on the server (think about NVMe drives) or directly connected to it. This setup eliminates network from the storage path and may provide you better I/O performance in the system. As the downside, you’d lose the flexibility of external storage, where you can add additional space and perform maintenance on the fly, transparently to the server.
Every storage subsystem has a “tipping point” after which the latency of I/O requests will start to grow exponentially with increase in throughput and IOPS (I/O operations per second). For example, you may get a 1-millisecond response with an IOPS workload of 1,000 and a 3-millisecond response with an IOPS workload of 50,000. However, you might cross the tipping point at 100,000 IOPS and start to get double-digit or even triple-digit latency.
Every component in the stack will have its own tipping point. For example, low queue depth in the HBA adapter may lead to queueing on the controller level as the number of I/O requests increases. In this case SQL Server will suffer from high latency and poor I/O performance; however, all SAN metrics will be perfectly healthy, with no latency at all.
You can use the DiskSpd utility to test storage subsystem performance. That utility emulates SQL Server’s workload in the system. You can download it from GitHub.
As I’ve noted, you’ll need to look at all storage subsystem components when you troubleshoot bad I/O performance. Nevertheless, the place to start is analyzing overall storage latency and the number of data SQL Server reads and writes. You can do this by looking at sys.dm_io_virtual_file_stats view.
The sys.dm_io_virtual_file_stats view is the most important tool in SQL Server I/O performance troubleshooting. This view provides I/O statistics by database file, including number of I/O operations, amount of data read and written, and information about stalls, or time for I/O requests to complete. (I use the terms latency and stalls interchangeably throughout this book.)
The data in this view is cumulative and is calculated from the time of the SQL Server restart. Take two snapshots of the data and calculate the delta between them (Listing 3-1 shows the code to do that). This code filters out database files with low I/O activity, since their metrics are usually skewed and not very useful.
CREATE TABLE #Snapshot ( database_id SMALLINT NOT NULL, file_id SMALLINT NOT NULL, num_of_reads BIGINT NOT NULL, num_of_bytes_read BIGINT NOT NULL, io_stall_read_ms BIGINT NOT NULL, num_of_writes BIGINT NOT NULL, num_of_bytes_written BIGINT NOT NULL, io_stall_write_ms BIGINT NOT NULL ); INSERT INTO #Snapshot(database_id,file_id,num_of_reads,num_of_bytes_read ,io_stall_read_ms,num_of_writes,num_of_bytes_written,io_stall_write_ms) SELECT database_id,file_id,num_of_reads,num_of_bytes_read ,io_stall_read_ms,num_of_writes,num_of_bytes_written,io_stall_write_ms FROM sys.dm_io_virtual_file_stats(NULL,NULL) OPTION (RECOMPILE); -- Set test interval (1 minute). Use larger intervals in production WAITFOR DELAY '00:01:00.000'; ;WITH Stats(db_id, file_id, Reads, ReadBytes, Writes ,WrittenBytes, ReadStall, WriteStall) as ( SELECT s.database_id, s.file_id ,fs.num_of_reads - s.num_of_reads ,fs.num_of_bytes_read - s.num_of_bytes_read ,fs.num_of_writes - s.num_of_writes ,fs.num_of_bytes_written - s.num_of_bytes_written ,fs.io_stall_read_ms - s.io_stall_read_ms ,fs.io_stall_write_ms - s.io_stall_write_ms FROM #Snapshot s JOIN sys.dm_io_virtual_file_stats(NULL, NULL) fs ON s.database_id = fs.database_id and s.file_id = fs.file_id ) SELECT s.db_id AS [DB ID], d.name AS [Database] ,mf.name AS [File Name], mf.physical_name AS [File Path] ,mf.type_desc AS [Type], s.Reads ,CONVERT(DECIMAL(12,3), s.ReadBytes / 1048576.) AS [Read MB] ,CONVERT(DECIMAL(12,3), s.WrittenBytes / 1048576.) AS [Written MB] ,s.Writes, s.Reads + s.Writes AS [IO Count] ,CONVERT(DECIMAL(5,2),100.0 * s.ReadBytes / (s.ReadBytes + s.WrittenBytes)) AS [Read %] ,CONVERT(DECIMAL(5,2),100.0 * s.WrittenBytes / (s.ReadBytes + s.WrittenBytes)) AS [Write %] ,s.ReadStall AS [Read Stall] ,s.WriteStall AS [Write Stall] ,CASE WHEN s.Reads = 0 THEN 0.000 ELSE CONVERT(DECIMAL(12,3),1.0 * s.ReadStall / s.Reads) END AS [Avg Read Stall] ,CASE WHEN s.Writes = 0 THEN 0.000 ELSE CONVERT(DECIMAL(12,3),1.0 * s.WriteStall / s.Writes) END AS [Avg Write Stall] FROM Stats s JOIN sys.master_files mf WITH (NOLOCK) ON s.db_id = mf.database_id and s.file_id = mf.file_id JOIN sys.databases d WITH (NOLOCK) ON s.db_id = d.database_id WHERE -- Only display files with more than 20MB throughput (s.ReadBytes + s.WrittenBytes) > 20 * 1048576 ORDER BY s.db_id, s.file_id OPTION (RECOMPILE);
Figure 3-4 shows the output from the view.
The goal is to keep stalls/latency metrics as low as possible. It is impossible to define thresholds that can be applied to all systems, but my rule of thumb is not to exceed 1 to 2-millisecond write stalls for transaction logs and 5 to 7- millisecond read and write stalls for data files when network storage is used. The latency should be even lower, in the sub-millisecond range, when you are using mode modern direct-attached drives.
Next, analyze throughput in the system. High stalls with low throughput usually indicate performance issues outside of SQL Server. Don’t forget to look at throughput across all files that share the same drive or controller. High throughput in some files may impact the metrics in others that share the same resource.
There is usually a correlation between throughput and stalls – the more data you are reading and writing, the higher latency you’ll have. This correlation is usually linear until you reach the tipping point, after which latency increases very quickly.
A large amount of reads and read stalls in the data files is often accompanied by a significant percent of PAGEIOLATCH waits and a low Page Life Expectancy performance counter value. This indicates that a large amount of data is constantly being read from disk. You need to understand why that is happening. In most cases, it’s due to nonoptimized queries that perform large scans reading data from disk. We will talk about how to detect those queries in the next chapter.
Don’t discount the possibility, though, that the server is underprovisioned and doesn’t have enough memory to accommodate an active dataset. That is also entirely possible. In either case, adding extra memory may be a completely acceptable solution that will reduce I/O load and improve performance of the system. It is, obviously, not the best solution, but in many cases it’s easier and cheaper to use hardware to solve the problem.
In users’ databases, large amount of writes and write stalls in data files often indicate inefficient checkpoint configuration. You may get some improvements by tuning the checkpoint configuration, as I will show later in the chapter. In the longer term, you may need to analyze if it is possible to reduce the number of data pages SQL Server writes to disk. Some ways to do this include removing unnecessary indexes; reducing page splits by changing FILLFACTOR and tuning the index maintenance strategy; decreasing the number of data pages by implementing data compression; and, potentially, refactoring database schema and applications.
When you see large throughput and stalls in tempdb, identify what causes them. The three most common causes are version store activity, massive tempdb spills, and excessive usage of temporary objects. We will talk about these in Chapter 9.
Finally, you can also get an idea of I/O latency by analyzing resource wait time in PAGEIOLATCH and other I/O-related waits. This won’t give you detailed information on a per-file basis, but it may be a good metric when you look at systemwide I/O performance.
The sys.dm_io_virtual_file_stats view provides useful and detailed information and points you in the right direction for further I/O troubleshooting, but it has one limitation: it averages data over the sampling interval.
This is completely acceptable when I/O latency is low. However, if latency numbers are high, you’ll want to determine if performance is generally slow or if the numbers have been skewed by some bursts in activity. You can do this by looking at the performance counters correlating SQL Server and disk metrics.
The troubleshooting process will vary slightly between Windows and Linux. In Windows, the simplest way to analyze the metrics is using the well-known PerfMon (Performance Monitor) utility. You can look at the SQL Server and I/O performance counters together and correlate data from them.
Performance Object | Performance Counters | Description |
Physical Disk | Avg Disk Queue Length Avg Disk Read Queue Length Avg Disk Write Queue Length |
Provides the average number of I/O requests (total, read, and write, respectively) queued during the sampling interval. Those numbers should be as low as possible. Spikes indicate that I/O requests are being queued at the OS level. |
Current Disk Queue Length | Gives you the size of the I/O request queue when the metric was collected. | |
Avg Disk sec/Transfer Avg Disk sec/Read Avg Disk sec/Write |
Indicates average latency for disk operations during the sampling interval. These numbers are usually similar to latency/stall metrics from the sys.dm_io_virtual_file_stats view when sampled over the same time period. However, because you typically measure sys.dm_io_virtual_file_stats over larger intervals, these counters will show you if I/O stalls were always high or if data has been affected by latency spikes. | |
Disk Transfers/sec Disk Reads/sec Disk Writes/sec Disk Bytes/sec Disk Read Bytes/sec Disk Write Bytes/sec |
Displays the number of I/O operations and throughput at the time of the reading. Similar to latency counters, you can use them to analyze the uniformity of the disk workload. | |
Avg Disk Bytes/Transfer Avg Disk Bytes/Read Avg Disk Bytes/Write |
Shows the average size of I/O requests, which can help you understand I/O patterns in the system. | |
SQL Server: Buffer Manager | Checkpoint pages/sec Background writer pages/sec |
Shows the number of dirty pages written by the checkpoint process. |
Lazy writer/sec | Provides number of pages written by lazy writer process | |
Page reads/sec Page writes/sec |
Display the number of physical reads and writes | |
Readahead pages/sec | Shows the number of pages read by read-ahead process. | |
SQL Server: Databases | Log Bytes Flushed/sec Log Flush Write Time (ms) Log Flushes/sec |
Provide you the data about throughput, latency and number of write requests for transaction log writes. Use those counters to understand uniformity of log generation when you troubleshoot high log write latency |
SQL Server: SQL Statistics | Batch Requests/sec | While these two counters are not I/O-related, they can be used to analyze spikes in system workload that may lead to bursts in I/O activity. |
SQL Server: Databases | Transactions/sec |
Usually, I start by looking at Avg Disk sec/Read and Avg Disk sec/Write latency counters, along with Avg Disk Queue Length. If I see any spikes in their values, I add SQL Server–specific counters to identify what processes may be leading to the bursts in activity.
Figure 3-5 illustrates one such example. You can see the correlation between Checkpoint pages/sec and high Avg Disk sec/Write and Avg Disk Queue Length values. This leads to the simple conclusion that the I/O subsystem cannot keep up with bursts of writes from the checkpoint process.
Pay attention to other applications installed on the server – it is possible that they are responsible for I/O activity bursts or other issues.
Linux doesn’t offer the standard PerfMon utility; however, there are plenty of free and commercial monitoring tools available. You can also use tools like iostat, dstat, and iotop, which are included in major Linux distributions. They provide general disk performance metrics on a per-process or system level.
On the SQL Server side, you can access performance counters through sys.dm_os_performance_counters view. Listing 3-2 shows you how to do that.
CREATE TABLE #PerfCntrs ( collected_time DATETIME2(7) NOT NULL DEFAULT SYSDATETIME(), object_name SYSNAME NOT NULL, counter_name SYSNAME NOT NULL, instance_name SYSNAME NOT NULL, cntr_value BIGINT NOT NULL, PRIMARY KEY (object_name, counter_name, instance_name) ); ;WITH Counters(obj_name, ctr_name) AS ( SELECT C.obj_name, C.ctr_name FROM ( VALUES ('SQLServer:Buffer Manager','Checkpoint pages/sec') ,('SQLServer:Buffer Manager','Background writer pages/sec') ,('SQLServer:Buffer Manager','Lazy writes/sec') ,('SQLServer:Buffer Manager','Page reads/sec') ,('SQLServer:Buffer Manager','Page writes/sec') ,('SQLServer:Buffer Manager','Readahead pages/sec') ,('SQLServer:Databases','Log Flushes/sec') -- For all DBs ,('SQLServer:Databases','Log Bytes Flushed/sec') -- For all DBs ,('SQLServer:Databases','Log Flush Write Time (ms)') -- For all DBs ,('SQLServer:Databases','Transactions/sec') -- For all DBs ,('SQLServer:SQL Statistics','Batch Requests/sec') ) C(obj_name, ctr_name) ) INSERT INTO #PerfCntrs(object_name,counter_name,instance_name,cntr_value) SELECT pc.object_name, pc.counter_name, pc.instance_name, pc.cntr_value FROM sys.dm_os_performance_counters pc WITH (NOLOCK) JOIN Counters c ON pc.counter_name = c.ctr_name AND pc.object_name = c.obj_name; WAITFOR DELAY '00:00:01.000'; ;WITH Counters(obj_name, ctr_name) AS ( SELECT C.obj_name, C.ctr_name FROM ( VALUES ('SQLServer:Buffer Manager','Checkpoint pages/sec') ,('SQLServer:Buffer Manager','Background writer pages/sec') ,('SQLServer:Buffer Manager','Lazy writes/sec') ,('SQLServer:Buffer Manager','Page reads/sec') ,('SQLServer:Buffer Manager','Page writes/sec') ,('SQLServer:Buffer Manager','Readahead pages/sec') ,('SQLServer:Databases','Log Flushes/sec') -- For all DBs ,('SQLServer:Databases','Log Bytes Flushed/sec') -- For all DBs ,('SQLServer:Databases','Log Flush Write Time (ms)') -- For all DBs ,('SQLServer:Databases','Transactions/sec') -- For all DBs ,('SQLServer:SQL Statistics','Batch Requests/sec') ) C(obj_name, ctr_name) ) SELECT pc.object_name, pc.counter_name, pc.instance_name ,CASE pc.cntr_type WHEN 272696576 THEN (pc.cntr_value - h.cntr_value) * 1000 / DATEDIFF(MILLISECOND,h.collected_time,SYSDATETIME()) WHEN 65792 THEN pc.cntr_value ELSE NULL END as cntr_value FROM sys.dm_os_performance_counters pc WITH (NOLOCK) JOIN Counters c ON pc.counter_name = c.ctr_name AND pc.object_name = c.obj_name JOIN #PerfCntrs h ON pc.object_name = h.object_name AND pc.counter_name = h.counter_name AND pc.instance_name = h.instance_name ORDER BY pc.object_name, pc.counter_name, pc.instance_name OPTION (RECOMPILE);
You can also bring sys.dm_io_virtual_file_stats view to the analysis, sampling its data and performance counters together every second. The approach is the same one we just discussed – you’ll look at the correlation between disk latency and activity and evaluate the general performance of the I/O subsystem, identifying tipping points in the load.
There are several layers in the storage stack you may need to analyze in addition to OS. They include virtualization, HBA/SCSI controller configuration, and the storage array itself.
I recommend working together with infrastructure and storage engineers during troubleshooting.
SQL Server mostly operates in shared environments. It shares storage and network infrastructure with other clients, and when virtualized, it runs on the same physical host with other VMs. As I said earlier in this book, when virtualization is being used, be sure to validate that the host is not overcommitted, which could lead to all sorts of performance issues.
Unless you have a very simple SQL Server setup that uses local storage, I/O requests will be serialized and sent over network. There are two typical problems here: insufficient queue depth and noisy neighbors.
A word of caution – storage arrays can handle a limitednumber of outstanding requests. Increasing queue depth on a busy server could increase the number of outstanding requests on the storage. You might shift the bottleneck from the server to the storage layer, especially if the storage serves requests from many busy systems.
The virtualization host and storage both expose throughput, IOPS, and latency metrics for analysis. On virtualization layers, the metrics may vary based on technology. For example, in Hyper-V you can use regular disk performance counters on the host. In VMWare, you can get the data from ESXTOP utility. In either case, the troubleshooting approach is very similar to what we have already discussed. Look at the available metrics, correlate data from them, and detect the bottlenecks in the I/O path.
Finally, check the storage configuration. Storage vendors usually publish best practices for SQL Server workloads: they are a good starting point. Pay attention to the allocation unit size’s alignment with the raid stripe size and partition offset, though.
For example, a 1024 MB partition offset, 4 KB disk block, 64 KB allocation unit, and 128 KB raid stripes are perfectly aligned, with each I/O request served by a single disk. On the other hand, 96 KB raid stripes will spread 64 KB allocation units across two disks, which leads to extra I/O requests and can seriously impact performance.
Again, it is always beneficial to work together with infrastructure and storage engineers. They are the subject matter experts and may help you to find the root cause of the problem faster than when you are working alone.
Finally, the best approach to get predictable performance in critical systems is to use a dedicated environment. Run SQL Server on dedicated hardware with direct-attached storage (DAS) to get the best performance possible.
As we all know, SQL Server uses write-ahead logging. Transactions are considered to be committed only after the log records are hardened in the transaction logs. SQL Server does not need to save dirty data pages to disk at the same time – it can reapply the changes by replaying log records if needed.
The checkpoint process saves data pages into the data files. The main goal of checkpoint is reducing recovery time in event of an SQL crash or failover: the fewer changes need to be replayed, the faster recovery will be. The maximum desired recovery time is controlled at either the server level or the database level. By default, both of them are 60 seconds.
You should not consider the recovery target to be a hard value. In many cases, the database will recover much faster than that. It is also possible for bursts of activity and long running transactions to prolong recovery beyond the target time.
There are four different types of checkpoints:
ALTER DATABASE SET TARGET_RECOVERY_TIME
command.Let me show you an example from one system I worked with. The sample of data from sys.dm_io_virtual_file_stats view over 1 minute had very high write latency for the data files. However, the smaller samples (1 to 3 seconds) rarely showed any activity at all.
Figure 3-6 shows the data, with the 1-minute sample at the top and the 1-second sample at the bottom.
This behavior led me to believe that the issue was related to checkpoint. I confirmed this hypothesis by looking at the Checkpoint pages/sec, Disk Writes/sec, and Avg Disk Queue Length performance counters. You can clearly see that burst of disk writes from the checkpoint process in Figure 3-5 earlier in the chapter, which shows the screenshot from PerfMon.
Although this instance ran SQL Server 2016, it used automatic checkpoint, because all databases had been upgraded from the earlier version of SQL Server. Enabling indirect checkpoint in the system immediately changed the I/O pattern, making it much more balanced.
You can see the performance counters in Figure 3-7. Notice that with indirect checkpoint, you should use Background writer pages/sec instead of the Checkpoint pages/sec counter.
Figure 3-8 shows the output from a 1-minute sample in the sys.dm_io_virtual_file_stats view. As you can see, latency went back to normal.
Indirect checkpoints do not completely eliminate I/O bursts. You can still have them, especially if the system has some spikes in data modifications. However, they are less frequent than with automatic checkpoints.
You may also need to tune the recovery target to get the most balanced I/O load. In the case above, I got the best results with a 90-second target. Of course, high values may increase recovery time in the system.
SQL Server uses several different wait types related to I/O operations. It is very common to see all of them present when the disk subsystem is not fast enough. Let’s look at five of the most common: ASYNC_IO_COMPLETION, IO_COMPLETION, WRITELOG, WRITE_COMPLETION, and PAGEIOLATCH.
This wait type occurs when SQL Server waits for asynchronous I/O operations (read or write) for non-buffer pool pages to complete. Examples include:
When I see significant presence of both ASYNC_IO_COMPLETION and PAGEIOLATCH waits in the system, I perform general I/O troubleshooting. If PAGEIOLATCH waits are not present, I look at how often ASYNC_IO_COMPLETION occurs. I may ignore that wait if its percentage is not very significant and disk latency is low.
The IO_COMPLETION wait type occurs during synchronous reads and writes in data files and during some read operations in transaction log. A few examples:
When you see significant percentages of this wait in the system, perform general disk-performance troubleshooting. Pay specific attention to tempdb latency and throughput; in my experience, bad tempdb performance is the most common reason for this wait. We will talk more about tempdb troubleshooting in Chapter 9.
As you can guess by the name, this wait occurs when SQL Server writes log records to the transaction log. It is normal to see this wait in any system; however, a large percentage may indicate a transaction-log bottleneck.
Look at average wait time and transaction log write latency in the sys.dm_io_virtual_file_stats view during troubleshooting. High numbers are impactful and may affect throughput in the system.
In addition to optimizing disk subsystem throughput, there are several other things you can do to reduce that wait. We will discuss them in Chapter 11.
This wait occurs during synchronous write operations in database and log files. In my experience, it is most common with database snapshots.
SQL Server maintains snapshot databases by persisting versions of data pages that existed at time the snapshot was created. At checkpoints after the snapshot was created, SQL Server reads old copies of data pages from data files and saves them into the snapshot before saving dirty pages to disk. This can significantly increase the amount of I/O in the system.
When you see this wait in the system, check if there are database snapshots present. Remember that some internal processes, like DBCC CHECKDB, also create internal database snapshots.
When snapshots are present and their usage is legitimate, you may need to analyze how to improve disk performance to support them. In other cases, you may need to remove them from the system if storage cannot keep up.
As you already know, PAGEIOLATCH waits occur when SQL Server reads data pages from disk. Those waits are very common and are present in any system. Technically, there are six such waits, but only three are typically present in the system:
Excessive amounts of PAGEIOLATCH waits show that SQL Server is constantly reading data from disk. This usually occurs under two conditions. The first is an underprovisioned SQL Server: when the active data does not fit into the memory. Second, and more often, it indicates the presence of nonoptimized queries that scan unnecessary data, flushing the contents of the buffer pool.
You can cross-check the data by looking at the Page Life Expectancy performance counter, which shows how long data pages stay in the buffer pool. As a baseline, you can generally use the value of 300 seconds per 4 GB of buffer pool memory: for example, 7,500 seconds on the server with 100 GB buffer pool.
You can see the value of Page Life Expectancy in the PerfMon utility or with the sys.dm_os_performance_counters view, as shown in Listing 3-3. It also returns values for individual NUMA nodes in the system.
SELECT object_name, counter_name, instance_name, cntr_value as [PLE(sec)] FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Page life expectancy' OPTION (RECOMPILE);
A large percentage of PAGEIOLATCH waits always requires troubleshooting. While it does not always introduce customer-facing problems, especially with low-latency flash-based disk arrays, the data growth may push the disk subsystem over the limit, which can become a problem that quickly affects the entire system.
You can reduce the impact of PAGEIOLATCH waits by upgrading the disk subsystem or adding more memory to the server. However, the best approach is reducing the amount of data to read from disk by detecting and optimizing inefficient queries. We’ll look at how to detect those queries in the next chapter.
SQL Server uses cooperative scheduling and, in the majority of cases, asynchronous I/O when it reads and writes data. By default, each scheduler has its own I/O queue and handles I/O in the system.
The sys.dm_io_virtual_file_stats view provides I/O throughput and latency metrics per database file. In a properly tuned system, the latency of transaction log writes should not exceed 1 to 2 milliseconds, and the latency of reads and writes to data files should not exceed 5 to 7 milliseconds with network-based storage and should be even lower with DAS.
Look at the entire I/O stack when troubleshooting bad I/O performance. The problem may be anywhere – in the OS, virtualization, network path, or storage layers.
In many cases, high I/O latency is introduced by bursts in I/O activity. Analyze and tune the checkpoint process – it is one of the most common offenders in busy systems.
In many cases, reducing disk activity will help you improve disk latency and system performance. Query optimization is one of the best ways to achieve that. We will look at how to detect non-optimized queries in the system in the next chapter.
Troubleshoot the following:
WRITELOG
waits (see Chapter 11).IO_COMPLETION
waits and high tempdb usage and latency (see Chapter 9).PAGEIOLATCH
waits in the system.3.129.247.196