Chapter 48. Query performance and disk I/O counters

Linchi Shea

SQL Server provides an excellent set of tools for troubleshooting query performance problems. At the top of the list, you have tools for capturing and analyzing query plans, tools for checking whether statistics are up to date or an index is useful, tools for capturing the time and resource cost of processing a query, and tools to help optimize SQL queries.

What you do not typically find in the toolset recommended for troubleshooting SQL Server query performance problems are the disk I/O performance counters. This is understandable because the disk I/O counters are the statistics at the operating system drive level. Although they can help you determine whether your disk subsystem is a bottleneck in the overall resource consumption of your SQL Server system, they do not generally reveal useful information on processing a particular query.

In some scenarios you can use the disk I/O performance counters in troubleshooting query performance problems. To completely ignore the disk I/O performance counters is to miss out on a powerful tool for troubleshooting SQL Server query problems.

Before diving into the query performance scenarios, let us first highlight the basics of disk I/Os, and review some salient disk I/O considerations in SQL Server query processing.

Expensive I/Os and very expensive I/Os

Disk I/Os are expensive and random disk I/Os are very expensive.

Accessing a piece of data on a disk drive is much slower than accessing the same piece of data in memory. How much slower? A conventional disk drive is an electromechanical device with a spinning spindle and one or more magnetic platters. No matter how fast the platters may spin, moving data through mechanical parts is inherently slower than moving data through electronic circuitry. In fact, disk access speed is measured in milliseconds (for example, 5 ms), whereas memory access speed is measured in nanoseconds (for example 100 ns).

What is important to recognize is that not all disk I/Os are equal in their performance. More specifically, random I/Os are far slower, or more expensive, than sequential I/Os.

Whether I/Os are random or sequential can be defined by the relative data locations of two consecutive I/O requests. If the next I/O request is for data at a random location, the I/O requests are random, whereas if the next I/O request is for data residing next to the currently requested data, the I/O requests are sequential. On a conventional disk, the time it takes to complete the operation of a random I/O typically includes moving the disk drive head to the right track on the platter and then waiting for the disk sector to rotate to the disk drive head. The time it takes to complete the operation of a sequential I/O typically involves moving the disk head between adjacent tracks and waiting for the right sector to rotate to the disk head.

Relatively speaking, the performance of a sequential I/O depends on the track-to-track seek time, whereas the performance of a random I/O depends on the average seek time. Why does this matter? Take a typical 15,000 rpm disk drive as an example. Its track-to-track seek time is 0.2 ms, whereas its average seek time is 2 ms. The performance difference has an order of magnitude!

SQL Server recognizes the performance difference between sequential I/Os and random I/Os, and its database engine employs many techniques to optimize for sequential I/Os. For example, in addition to being a technique for crash recovery, transaction logging can be viewed as an optimization that converts random writes to sequential writes. Read-ahead is another optimization that attempts to take advantage of sequential reads.

Disk performance counters

Windows exposes a large number of disk performance counters. You can use the following counters under the LogicalDisk object as the key performance indicators to evaluate disk I/O performance:

  • Avg. Disk sec/ReadThe number of seconds to complete a read operation on the disk drive, averaged over the polling interval
  • Avg. Disk sec/WriteThe number of seconds to complete a write operation on the disk drive, averaged over the polling interval
  • Avg. Disk Bytes/ReadThe number of bytes transferred from the disk drive per read operation, averaged over the polling interval
  • Avg. Disk Bytes/WriteThe number of bytes transferred to the disk drive per write operation, averaged over the polling interval
  • Disk Reads/secThe number of read operations on the disk drive per second
  • Disk Writes/secThe number of write operations on the disk drive per second
  • Disk Read Bytes/secThe number of bytes transferred from the disk drive per second
  • Disk Write Bytes/secThe number of bytes transferred to the disk drive per second
  • Current Disk Queue LengthThe number of requests outstanding on the disk drive

These counters measure five key I/O metrics:

  • Disk I/O latency— For example, Avg. Disk sec/Read
  • Disk I/O size— For example, Avg. Disk Bytes/Read
  • I/O operations per second— For example, Disk Reads/sec
  • I/O throughput— For example, Disk Read Bytes/sec
  • I/O queue length— For example, Current Disk Queue Length

I/O latency

The latency of an I/O request is also known as I/O response time. It is measured by taking the difference between the time the I/O request is submitted and the time the completion acknowledgement is received. I/O latency can be measured at different levels of the I/O stack. The Avg. Disk sec/Read counter is a measure taken in the Windows logical disk driver.


It is critical, but often forgotten, that all the preceding disk I/O performance counters be collected and evaluated as a whole in order to see the complete picture of the disk I/O activities. In particular, if you focus on the disk latency counters without also checking the I/O size counters, you may end up drawing a wrong conclusion. For instance, a commonly accepted threshold for judging whether a disk I/O is taking too long is 10 ms. In practice, you need to make sure that this threshold applies only to smaller I/Os. When a system is doing large I/Os, the threshold of 10 ms may be too low, and can lead to false alarms.

Random or sequential I/Os and disk performance counters

By now, hopefully you agree that to gain better I/O throughput, sequential disk I/Os are much preferred over random disk I/Os. But how do you identify whether SQL Server is doing sequential I/Os as opposed to random I/Os? None of the disk performance counters tells you whether I/Os are sequential or random. What good does it do to talk about random I/Os versus sequential I/Os if you can’t tell them apart?

The good news is that there is often a strong correlation between I/O sizes and I/O sequentiality. Generally speaking, small I/Os tend to be random in that their performance depends on the average seek time, and large I/Os tend to be sequential in that their performance depends on the track-to-track seek time.

Note that there is no hard and fast threshold that separates a small I/O from a large I/O. With that in mind, in the context of SQL Server, if an I/O is smaller than 64 KB, such as a single-page I/O, you can safely consider it a small I/O, and if an I/O is much larger than 64 KB, you can consider it a large I/O.

You may argue that small sequential I/Os, such as 2 KB sequential I/Os, are perfectly legitimate I/O patterns, and you would be right if you are looking at the disk I/O subsystem in isolation. After all, if you use an I/O benchmark tool such as SQLIO or Iometer, you can easily generate 2 KB sequential reads and writes.


SQLIO and Iometer

SQLIO is a Windows command-line tool for generating I/O workloads and measuring their performance. It is a free download from Microsoft. SQLIO is often used for benchmarking a disk subsystem. Iometer is a widely used multi-platform I/O benchmarking tool. Originally developed at Intel, Iometer is now distributed as an open source project. Google for SQLIO and Iometer to find their respective current download links.


Keep in mind that, as a SQL Server professional, you are interested in disk I/Os generated by SQL Server. And with the exception of database transaction logging, SQL Server is coded to avoid making small sequential I/O requests when it can issue large sequential I/Os. Whenever possible, SQL Server combines otherwise multiple sequential I/Os into a single larger I/O request. As described in Bob Dorr’s excellent white papers, “SQL Server 2000 I/O Basics” and “SQL Server I/O Basics Chapter 2,” checkpoints, lazy writes, eager writes, and read-ahead all use similar optimization to flush or read multiple contiguous pages, and as a result, heavily favor large I/Os.

For small sequential I/Os, the overhead with each I/O request is a significant factor in the overall I/O performance. In addition, because multiple concurrent I/O requests are often outstanding and the disk storage is often shared by different hosts or different applications in practice, it is difficult to limit small sequential I/Os to track-to-track seek time. For instance, if the disk head is being moved away because of other I/O requests taking place in between these small sequential I/O requests, the average seek time may become a determinant factor in the performance of these small sequential I/Os, therefore effectively turning them into random I/Os.

Although life is made difficult by not having any disk performance counters to distinguish random I/Os from sequential I/Os, it is easy to tell small I/Os from large I/Os using the following performance counters under the LogicalDisk object:

  • Avg. Disk Bytes/Read
  • Avg. Disk Bytes/Write

If you don’t care about separating reads from writes, Avg. Disk Bytes/Transfer is a convenient counter to measure the I/O size.

SQL Server operations and I/O sizes

Before you can effectively take advantage of the I/O performance counters in troubleshooting query performance problems, you need to understand how some of the key SQL Server operations manage their disk I/O sizes. It is beyond the scope of this chapter to discuss the I/O implications of every SQL Server operation. Instead, let’s review the following key SQL Server operations with respect to how their performance is related to I/O sizes:

  • Checkpoints
  • Lazy writes
  • Read-ahead reads
  • Table scans and index scans
  • Index seeks and bookmark lookups

Checkpoints and lazy writes use essentially the same algorithm to control the sizes of their disk I/O requests. SQL Server 2000 tries to bundle up to 16 pages (up to 128 KB) in a single write request, whereas SQL Server 2005 and 2008 can bundle up to 32 pages (up to 256 KB) in a single write request. Obviously, if dirty pages are not contiguous with respect to the page numbers in a data file, checkpoints and lazy writes will have to issue smaller I/Os.

Similarly, read-ahead reads will attempt to issue large I/O requests, whenever possible. In practice, it is common to see SQL Server read-ahead reads posting I/O requests that are greater than 128 KB in size. SQL Server 2005 and SQL Server 2008 are more aggressive than SQL Server 2000 when they post read-ahead reads. Like-to-like empirical tests found that the sizes of read-ahead requests issued by SQL Server 2005 and SQL Server 2008 are often larger than those by SQL Server 2000.

When SQL Server decides to use a table scan or an index scan to process a query, the access methods are often accompanied by large I/O requests. It should be noted that a table scan or an index scan will not necessarily result in large I/O requests. If SQL Server cannot make use of read-ahead reads, a table scan or an index scan may have to resort to 8 KB I/O requests. When you have a large table to scan, reading one 8 KB page at a time is inefficient.

Index seeks and bookmark lookups involve traversing and inspecting linked individual pages. The I/O pattern is typically 8 KB random reads; 64 KB reads may also be observed. Hopefully, the preceding discussion on random I/Os being very expensive has not led you to reject 8 KB random reads as universally inefficient. They are indeed inefficient if they have to be issued in large numbers, for instance, when processing a query that touches a large number of pages. That being said, if you are looking for a specific piece of data (for instance, an individual order or a customer address), small random I/Os are exactly what you need to quickly locate the data.

How expensive are small random I/Os, anyway?

Random I/Os are expensive because they depend on the average seek time, whereas sequential I/Os depend on the track-to-track seek time, and there can be more than an order of magnitude difference in the performance between the average seek time and the track-to-track seek time.

As a database professional, you probably would feel more comfortable if you see concrete performance numbers that are related to SQL Server query processing than you would with the disk geometry. Let’s conduct a little experiment.

First, create a database with at least 6 GB for the data file and 2 GB for the log file. Then, run the script as shown in listing 1 to create a test table and populate it with 4 million rows.

Listing 1. Creating a test table and populating it with 4 million rows
EXEC sp_configure 'max degree of parallelism', 1;
go
Reconfigure with override;
go
CREATE TABLE test(
c1 int,
c2 int,
dt datetime,
filler CHAR(1000) NOT NULL
);
go

SET NOCOUNT ON;
go
DECLARE @i int;
SET @i = 1;

BEGIN TRAN
WHILE @i < 4000000
BEGIN
INSERT test(c1, c2, dt, filler)
SELECT @i,
CASE WHEN @i%2 = 0 THEN @i ELSE 4000000-@i END,
GETDATE(),
'abc';

IF (@i % 100000 = 0)
BEGIN
COMMIT TRAN;
BEGIN TRAN;
END

SELECT @i = @i + 1;
END
COMMIT TRAN;
go
CREATE INDEX cx_test ON test(c2);
go

When this script completes, launch Performance Monitor and add the following two LogicalDisk counters to monitor the I/O size and the I/O throughput for the drives where the data file of the test database is created:

  • Avg. Disk Bytes/Read
  • Disk Read Bytes/sec

Then, run the two test scripts in table 1, one at a time.

Table 1. Test scripts to see the impact of small random I/Os

Test

Script

Test 1

-- table scan and large sequential I/Os
DBCC DROPCLEANBUFFERS;
go
SELECT MAX(dt) FROM test;

Test 2

-- bookmark lookups and small random I/Os
DBCC DROPCLEANBUFFERS;
go
SELECT MAX(dt) FROM test WITH (INDEX=2);

In test 1, you will see that SQL Server chooses to process the SELECT query with a table scan. In test 2, SQL Server is forced to use the nonclustered index to retrieve the data pages through bookmark lookups.


Note

In each test script, DBCC DROPCLEANBUFFERS is run first to remove all the clean pages from the SQL Server buffer pool. This ensures that all pages are retrieved from disk by the subsequent SELECT query. Do not run this statement on a production server.


Table 2 summarizes the performance statistics of the two tests in terms of query elapsed time, I/O size, and I/O throughput.

Table 2. Performance impact of small random I/Os

Test

Query elapsed time

I/O size

I/O throughput

Test 1

12

~ 256 KB

~ 340 MB/sec

Test 2

44

~ 8 KB

~ 80 MB/sec

Although the numbers may vary when you execute the tests in different environments, the following pattern should hold:

  • SQL Server issues large I/Os (~256 KB per read) in test 1, and small I/Os (8 KB ~ 64 KB per read) in test 2.
  • Depending on the storage capability, SQL Server should achieve much higher I/O throughput in test 1 than in test 2: a direct consequence of using large I/Os.
  • The query performs significantly better in terms of elapsed time in test 1 than in test 2 because of the higher disk I/O throughput in test 1.

In this experiment, the same query ran almost four times as fast in test 1 as it did in test 2. On a lower end disk subsystem that has lower throughput capacity, the difference will not be so pronounced. For instance, on a disk subsystem whose maximum throughput capacity is 120 MB/sec, the 256 KB sequential scan was seen to achieve 100 MB/sec and the 8 KB random lookups reached 60 MB/sec. Although the gain was less than 100 percent, the difference was still significant.

Performance scenarios

These are the key messages in this chapter so far:

  • Random I/Os are expensive.
  • SQL Server is programmed to take advantage of sequential I/Os.
  • You cannot directly observe whether I/Os are sequential or random. That being said, because sequential I/Os are associated with large I/Os and random I/Os with small I/Os, you can instead observe I/O sizes with the disk performance counters.
  • Processing a large query with small I/Os is not efficient.

This is interesting information in itself. But how does it help you in practice? Let’s explore four query performance scenarios where you can put this information to good use.

Scenario 1: constant checkpoints

This was a real-world case encountered by a database administrator (DBA).

When reviewing the logged SQL Server performance counter values, the DBA discovered that checkpoints were taking place almost continuously on one of their busiest databases. At the same time, it was found that lazy writes were kept constantly active as well. Both checkpoints and lazy writes were working hard to flush out dirty pages and make room for free pages that are needed in the buffer pool, but they could barely keep up with the changes.

Is there anything we could do to improve the situation? The DBA was asked to provide an analysis and recommend a course of action for better performance. Fortunately, the disk performance counters were also logged along with the SQL Server counters. Correlating the disk performance counters with the SQL Server buffer manager counters led to the observation that SQL Server was doing small I/Os during checkpoints (Avg. Disk Bytes/Write <= ~16 KB). The performance capacity of the disk storage subsystem was not being fully utilized by the checkpoint I/O requests.

The DBA realized that if the checkpoints could be done quicker, more free pages would be readily available, with the result of alleviating pressure on the lazy writer, keeping the pages in the buffer pool longer, and therefore improving the overall performance of the server. The solution in this case was to re-organize several key tables on different clustered indexes so that updates took place on pages that were more contiguous. This solution was evaluated against the potential increase in the lock contention, and in this case the I/O performance gain outweighed the downside risk of increased contention.

Scenario 2: NOLOCK and faster query processing

You probably know that using the NOLOCK hint in a table scan not only reduces—or even eliminates—the potential for blocking other user processes, it may also improve—sometimes significantly—the performance of your query. The often-stated reason is that with the NOLOCK hint, SQL Server can choose to perform the table scan via the allocation order instead of the index order. This is a fine explanation, but it says nothing about why an allocation order scan may be more efficient than an index order scan.

Looking from a disk I/O perspective, the efficiency of an allocation order scan lies in the fact that it generally issues larger I/Os than does an index order scan.

Let’s again look at some empirical evidence. Modify the script in listing 1 so that it creates a clustered index on the c1 column instead of a nonclustered index on the c2 column. Run the following SQL statement after the test table is populated with data:

CREATE CLUSTERED INDEX cix_test ON test(c1);

Then, run the two scripts in table 3 separately to observe their respective I/O sizes and I/O throughputs with the performance counters Avg. Disk Bytes/Read and Disk Read Bytes/sec.

Table 3. Test scripts to see the impact of the NOLOCK hint

Test

Script

Test 1

DBCC DROPCLEANBUFFERS;
go
SELECT MAX(dt) FROM test;

Test 2

DBCC DROPCLEANBUFFERS;
go
SELECT MAX(dt) FROM test WITH (NOLOCK);

Table 4 summarizes the performance statistics from running the scripts on a test server.

Table 4. Performance impact of the NOLOCK hint

Test

Elapsed time

I/O size

I/O throughput

Test 1

70

~ 350 KB

~ 85 MB/sec

Test 2

36

~ 500 KB

~ 125 MB/sec

The query with the hint was nearly twice as fast as the query without the NOLOCK hint. Note also that with NOLOCK, the query was able to achieve much higher disk read throughput than when NOLOCK was not specified. And with the NOLOCK hint, the average I/O size of reading the pages from disk was much larger than without NOLOCK, although both were doing large I/Os.


Warning

The NOLOCK hint should not be used injudiciously just to help improve performance because your query may return inconsistent data. Where potential data inconsistency is acceptable or where not blocking the main process is paramount, the NOLOCK hint can be an effective tool.


Scenario 3: read-ahead reads

When it comes to processing large queries (queries that touch a large number of pages), read-ahead reads are central to their performance. If for whatever reason the read-ahead operation can’t grab a sufficient number of pages to issue large I/O requests, your query performance will degrade dramatically.

To highlight the importance of the read-ahead operation, let’s look at an extreme case where it is disabled entirely.

Again, modify the script in listing 1 so that it creates a clustered index on the c1 column instead of a nonclustered index on the c2 column. Run the following SQL statement after the test table is populated with data:

CREATE CLUSTERED INDEX cix_test ON test(c1);

Then, run the following two scripts, shown in table 5, separately to observe their respective I/O sizes with the performance counter Avg. Disk Bytes/Read. The script in test 1 runs the SELECT query with the read-ahead operation enabled, which is the default SQL Server behavior, and the script in test 2 runs the SELECT query with the read-ahead operation disabled via trace flag 652.

Table 5. Test scripts to see the impact of the read-ahead operation

Test

Script

Test 1

DBCC DROPCLEANBUFFERS;
go
DBCC TRACEOFF(-1, 652); --it is off by default
go
SELECT MAX(dt) FROM test;

Test 2

DBCC DROPCLEANBUFFERS;
go
DBCC TRACEON(-1, 652); -- this disables readahead
go
SELECT MAX(dt) FROM test;

Table 6 summarizes the performance statistics when the tests were run on one of my servers.

Table 6. Performance impact of the NOLOCK hint

Test

Elapsed time

I/O size

I/O throughput

Test 1

12 sec

~ 512 KB

~ 300 MB/sec

Test 2

50 sec

~ 64 KB

~ 80 MB/sec

On this particular test server, the impact of the read-ahead operation is dramatic. With the read-ahead operation, SQL Server was able to issue large I/O requests (~512 KB per read) and pushed the disk read throughput to ~300 MB per second. Without the read-ahead operation, SQL Server had to resort to relatively small I/O requests (~64 KB per read, and 8 KB per read was also observed on a different test server), and the read throughput was significantly lower at about 80 MB per second. The end result is that SQL Server was able to complete processing the query four times faster with read-ahead reads than without read-ahead reads on this test server.

Scenario 4: index fragmentation

You probably know, or have read, that index fragmentation is bad for the performance of large queries. In particular, logical scan fragmentation can have a detrimental impact on the performance of queries that are processed with index ordered scans. Why? The most common explanation is that when logical order doesn’t match the physical order, SQL Server has to do more work in following the index page chain.

Instead of trying to understand how SQL Server navigates through logical scan fragmentation internally, you can gain a solid understanding of its performance impact by looking at the disk performance counters—more specifically, Avg. Disk Bytes/Read and Disk Read Bytes/sec.

Let’s again use the script in listing 1. To introduce logical scan fragmentation, first remove the CREATE INDEX statement from the script, and then insert the following SQL statement immediately after the test table is created and before the test table is populated with data:

CREATE CLUSTERED INDEX cix_test ON test(c2);

After the test table is populated with data using the script in listing 1, run DBCC SHOWCONTIG('test') to confirm the presence of significant logical scan fragmentation. The output should look like listing 2.

Listing 2. Output from running DBCC SHOWCONTIG on the test table
DBCC SHOWCONTIG scanning 'test' table...
Table: 'test' (69575286); index ID: 1, database ID: 8
TABLE level scan performed.
- Pages Scanned................................: 933333
- Extents Scanned..............................: 117019
- Extent Switches..............................: 600276
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 19.44% [116667:600277]
- Logical Scan Fragmentation ..................: 85.33%
- Extent Scan Fragmentation ...................: 0.01%
- Avg. Bytes Free per Page.....................: 3703.1
- Avg. Page Density (full).....................: 54.25%

Now, run the following script, and observe the disk I/O size with the Avg. Disk Bytes/Read counter and the disk I/O throughput with the Disk Read Bytes/sec counter.

DBCC DROPCLEANBUFFERS;
go
SELECT MAX(dt) FROM test;

For comparison, defragment the clustered index by dropping and re-creating it, and then rerun the preceding script.


Note

Because the average page density from the DBCC SHOWCONTIG output is 54.25 percent, when you re-create the clustered index you need to set the fill factor to about 50 percent so that the re-created index has the same page density, and most importantly, the same number of pages. Otherwise, the comparison is not valid.


Table 7 summarizes the results from a test server with a directly attached disk drive.

Table 7. Performance impact of index fragmentation

Test

Elapsed time

I/O size

I/O throughput

Logical Scan Fragmentation = 0

43

~ 400 KB

~ 100 MB/sec

Logical Scan Fragmentation = 85%

275

~ 16 KB

~ 30 MB/sec

On this test server, the SELECT query ran more than six times faster without logical scan fragmentation.

The test result shows that logical scan fragmentation seriously hampered SQL Server’s ability to perform read-ahead reads. Whereas it was able to do about 12000 read-ahead pages per second when there was no logical scan fragmentation, it could only do about 3000 read-ahead pages per second when logical scan fragmentation was at 85 percent. In addition, fragmentation caused SQL Server to issue small I/O requests; compare the Avg. Disk Bytes/Read of 16 KB with fragmentation to that of about 400 KB without fragmentation, and this resulted in SQL Server not being able to take full advantage of the I/O throughput capacity of the disk subsystem.

A note of caution is in order before you extrapolate this too far. If you run the same tests on a higher-end disk subsystem, you may not find the performance difference to be this dramatic.

Summary

The key message of this chapter is that the disk I/O performance counters can reveal a significant amount of information on query processing, and you can effectively use the disk I/O counter values to help analyze query performance issues in scenarios where SQL Server must incur significant physical I/Os, such as when processing large queries.

Despite much talk about preferring sequential I/Os to random I/Os, you cannot directly observe whether I/Os are sequential or random. For processing large SQL queries, you can instead choose to observe their I/O sizes and I/O throughputs. The key observation is that large I/Os are much more efficient than small I/Os, and large I/Os are often sequential I/Os, and SQL Server is optimized to take advantage of large I/Os.

In practice, you need to exercise caution when observing disk I/O sizes as there is often a multitude of activities on a server, and it may be difficult to attribute the values of the disk I/O performance counters to any specific queries.

About the author

Linchi Shea has been working with SQL Server since Version 4.21a. He works in the financial services sector in and around New York City. Linchi is passionate about creating robust and automated infrastructures for managing SQL Server in an enterprise environment. He has written many tools and utilities to simplify deploying, operating, and testing SQL Server. You can find him blogging about SQL Server enterprise issues at www.sqlblog.com. Linchi is a co-founder and co-lead of the NYC SQL Server users group, and has been a Microsoft MVP since 2002.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.116.189