© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
G. FritcheySQL Server 2022 Query Performance Tuninghttps://doi.org/10.1007/978-1-4842-8891-7_11

11. Key Lookups and Solutions

Grant Fritchey1  
(1)
Grafton, MA, USA
 

Nonclustered indexes help query performance in all manner of ways. However, unlike clustered indexes, the data isn’t stored with nonclustered indexes. Therefore, a common occurrence is the lookup, going back to the clustered index or the heap to retrieve the data not stored with the nonclustered index. In some cases, this is benign behavior. In other cases, this becomes a performance problem. Having ways to deal with this common issue can help your systems.

In this chapter, I cover the following topics:
  • The purpose of lookups

  • Performance issues caused by lookups

  • Analysis of the cause of lookups

  • Techniques to resolve lookups

Purpose of Lookups

As you know from previous chapters, you can store data in a heap or a clustered index. Then, you can create additional nonclustered indexes to aid queries that are searching for data, other than how the data is stored. The optimizer recognizes these indexes based on their utility to the query in question, usually around the WHERE, JOIN, and HAVING clauses. If the query refers to columns that are not a part of the nonclustered index, that data must be retrieved from the table. This is what a lookup is.

If we look at Listing 11-1, we’ll see that a nonclustered index can help us filter the data faster, but that index is not covering, and so a lookup is necessary.
SELECT p.NAME,
       AVG(sod.LineTotal)
FROM Sales.SalesOrderDetail AS sod
    JOIN Production.Product AS p
        ON sod.ProductID = p.ProductID
WHERE sod.ProductID = 776
GROUP BY sod.CarrierTrackingNumber,
         p.NAME
HAVING MAX(sod.OrderQty) > 1
ORDER BY MIN(sod.LineTotal);
Listing 11-1

Retrieving sales data

The SalesOrderDetail table has a nonclustered index on the ProductID column. This is the index the optimizer can use to speed up data retrieval. The table has a clustered index defined on the SalesOrderID and SalesOrderDetailID columns, so they are a part of the nonclustered index as the row locator. However, since they’re not referenced in the query, they don’t do anything but act as the row locator. The other columns referenced in the queryLineTotal, CarrierTrackingNumber, and OrderQtyare not a part of the nonclustered index key or INCLUDE columns. This means the optimizer has to perform a lookup to retrieve the data as you see in Figure 11-1.

An execution plan highlights the flow of index seek to nested loops and key lookup points compute scalar and finally to the nested loops.

Figure 11-1

Execution plan for the query showing a key lookup

The lookup operation is blown up from the full execution plan in Figure 11-1. The Index Seek operation filters the rows returned down to 228. But then, an additional Join operation, the Nested Loops, is added in order to work with the Key Lookup operation against the clustered index to return the additional columns needed. You can always check the properties of the Key Lookup operator to identify the Output to see exactly which columns are retrieved in that way as illustrated in Figure 11-2.

A table with 2 columns and 4 rows. It contains 4 output lists, and the row entries are adventure works, sales, sales order details of a carrier tracking number, order quantity, unit price, and discount.

Figure 11-2

The Output List property showing the columns in the lookup

Performance Issues Caused by Lookups

A lookup necessitates retrieving pages from where the data is stored as well as the pages from the nonclustered index. Accessing more pages quite simply increases the number of logical reads for a given query. Additionally, if the pages are not available in memory, a lookup will probably require a random (nonsequential) I/O operation on the disk to go from the index page to the data page. On top of that, there is CPU processing necessary to marshal the data and perform the operations.

In addition to that cost, you also add the cost of the join operation necessary to put the two sets of data together.

These costs associated with lookups are why it’s recommended to keep data retrieval from nonclustered indexes to small sets of data. As the data set size increases, the costs associated with lookups go up as well.

Let’s look at an example to illustrate this point. Listing 11-2 shows a query that returns a relatively large data set and all columns through the SELECT * operation (used intentionally for illustration purposes).
SELECT *
FROM Sales.SalesOrderDetail AS sod
WHERE sod.ProductID = 793;
Listing 11-2

A larger data set from the SalesOrderDetail table

While a very simple query, it is returning more than 700 rows, as well as all the columns of the table. The optimizer chooses to perform a Clustered Index Scan as shown in Figure 11-3.

A diagram displays the flow of the clustered index scan that undergoes 2 compute scalar states, filter cost, then select cost.

Figure 11-3

A scan in order to retrieve a larger data set

We can force the optimizer to use the nonclustered index on the ProductID column by using an index hint as shown in Listing 11-3.
SELECT *
FROM Sales.SalesOrderDetail AS sod WITH (INDEX(IX_SalesOrderDetail_ProductID))
WHERE sod.ProductID = 793;
Listing 11-3

Forcing an index to be used

This query will change the plan to one with an index seek and a key lookup, similar to what we saw earlier in Figure 11-1. However, the reads for the query in Listing 11-3 went to 2,251 from the value of 1,248 for Listing 11-2.

As you can see, while the lookup operation is a necessary one, it can be quite costly.

Analysis of the Causes of Lookups

Not all lookups have to be eliminated because some queries aren’t called very frequently and they run fast enough as they are. However, when a query needs to run faster, and there’s a lookup operation, depending on the number of columns that have to be dealt with, that lookup can be some low-hanging fruit to improve performance.

Listing 11-4 shows another query, retrieving information from the HumanResources.Employee table using an index on the NationalIDNumber column.
SELECT NationalIDNumber,
       JobTitle,
       HireDate
FROM HumanResources.Employee AS E
WHERE E.NationalIDNumber = '693168613';
Listing 11-4

Retrieving basic information from the Employee table

Running this query results in the execution plan you see in Figure 11-4.

A diagram displays the standard key lookup operation where both indexes seek and key lookup point to nested loops, then point to select.

Figure 11-4

A standard key lookup operation

In the example in Listing 11-4, since we’re only retrieving three columns from the table, and we know the index is on the NationalIDNumber column, it’s pretty easy to infer that the lookup is for the other two columns. However, with large queries, columns in use in multiple clauses, it can be tricky to simply look at a query and understand which columns you need to deal with.

To know exactly which columns are taking part in the lookup operation, go to the lookup operator and open up the Properties. The output columns will tell you what you need to know. Figure 11-5 shows the output for the Key Lookup operator.

A table displays the 2 columns of 2 sets of output lists. The output list has an alias, column, database, scheme, and tables.

Figure 11-5

The two columns in the Output List of the Key Lookup operator

You can see all the details available—the column, table, and schema, everything you need. Also, at the top of Figure 11-5, on the right, you can see an ellipsis. Clicking on that opens the columns up in a text window, making it easy to copy, as illustrated in Figure 11-6.

A dialog box of output list with 2 entries. Entries are adventure works, human resources employee job title, and hire date.

Figure 11-6

Columns highlighted in a text window

With this information, you can now decide how you plan to resolve the lookup operation.

Techniques to Resolve Lookups

I’ve said it already in this chapter, but it bears repeating, not every lookup operation requires immediate attention. Understanding how the query in question is working within the system can help you decide if you need to fix the lookup. However, you’re usually unaware of lookups in queries that are running fast enough. You’re looking at the execution plan of a query because it’s running slowly.

To fix a lookup, you have three basic approaches:
  • Create a clustered index.

  • Use a covering index.

  • Take advantage of index joins.

Let’s explore all three of these in a little more detail.

Create a Clustered Index

Since the leaf pages of a clustered index contain all the columns for a table (with a few exceptions), when a clustered index is used to retrieve the rows, no lookup operations are needed. If the index used for Listing 11-4 earlier was recreated as the clustered index for the table, no lookup operation would occur.

In most cases, this just isn’t an option. You’ve already designed the table around a good, clustered index. You can’tin fact, you shouldn’tsimply swap those indexes around. At least, not without extensive testing. However, if you’re working with a heap table, you have the opportunity to create a new clustered index, which would resolve any RID lookup operations.

Use a Covering Index

In Chapter 10, I explained that a covering index is one that has all the columns needed to satisfy a given query. This means that the index would not need to go to where the data is stored in order to retrieve all the columns.

You have two mechanisms for making a nonclustered index into a covering index. The first of these is to change the key of the index to include all the columns needed for a given query. This choice has serious implications for the index. Let’s take a look at the density graph of the existing index, AK_Employee_NationalIDNumber, using Listing 11-5.
DBCC SHOW_STATISTICS('HumanResources.Employee', 'AK_Employee_NationalIDNumber') WITH DENSITY_VECTOR;
Listing 11-5

Using DBCC SHOW_STATISTICS to see the density graph

The results are visible in Figure 11-7.

A table displays 3 columns and 2 rows. The column headers are all density, average length, and columns. The first row entry, 0.003448276, is highlighted.

Figure 11-7

Density graph for the original index

The average key length for the existing index, which includes the indexed column, NationalIDNumber, and the row identifier, in this case, the clustered index key, BusinessEntityID, is 21.66.

We can try to address the lookup by modifying the existing index using Listing 11-6.
CREATE UNIQUE NONCLUSTERED INDEX AK_Employee_NationalIDNumber
ON [HumanResources].[Employee] (
                                   NationalIDNumber ASC,
                                   JobTitle,
                                   HireDate
                               )
WITH DROP_EXISTING;
Listing 11-6

Recreating the index with a new key

If you were to rerun the original code in Listing 11-4, you’d see that the lookup operation is gone. The execution plan now looks like Figure 11-8.

A diagram depicts that the index seeks of non cluster with a cost of 100 percentage points to select.

Figure 11-8

Lookup operation eliminated

However, when we now run Listing 11-5 to get the density graph, we see the results in Figure 11-9.

A table has 3 columns and 4 rows. The column headers are all density, average length, and columns. The first column entry, 0.003448276, is highlighted.

Figure 11-9

The density graph for the new index

The average length of the key has gone from 21.66 to 74.48, more than three times bigger. That means fewer rows per page, and so page reads will go up when accessing this new index. That doesn’t even address the fact that the index behavior is now going to be different overall.

The other way to get a covering index is to leave the key alone and simply add the columns you want to the leaf level of the index, using the INCLUDE clause as shown in Listing 11-7.
CREATE UNIQUE NONCLUSTERED INDEX AK_Employee_NationalIDNumber
ON [HumanResources].[Employee] (NationalIDNumber ASC)
INCLUDE (
            JobTitle,
            HireDate
        )
WITH DROP_EXISTING;
Listing 11-7

Using the INCLUDE clause to recreate the index

This will end up with an execution plan basically identical to what you see in Listing 11-8. However, with this new index in place, the density graph looks a little different, as shown in Figure 11-10.

A table has 3 columns and 2 rows. The column headers are all density, average length, and columns. All density values are 0.003448276 and 0.003448276.

Figure 11-10

Average key length is back down

The average key length is back to what it was because while we’ve added data to the index, it’s stored at the leaf, not on with the key. Overall, this should perform better in terms of reads, although for such a small key and data set, you’re unlikely to see it here in this example.

Before we go, there is one other way to make an index into a covering index. I’m going to reset the index all the way back to the beginning using Listing 11-8.
CREATE UNIQUE NONCLUSTERED INDEX AK_Employee_NationalIDNumber
ON [HumanResources].[Employee] (NationalIDNumber ASC)
INCLUDE (
            JobTitle,
            HireDate
        )
WITH DROP_EXISTING;
Listing 11-8

Resetting the index

Now, if we actually modify the query, to change the columns retrieved, we can see a covering index again. Listing 11-9 shows the updated query.
SELECT NationalIDNumber,
       E.BusinessEntityID
FROM HumanResources.Employee AS E
WHERE E.NationalIDNumber = '693168613';
Listing 11-9

A modified query

Once more, this results in an execution plan that looks like Figure 11-8. It’s a covering index because the clustered index key, BusinessEntityID, is included with the index as the row locator. So we can modify code to make an index covering. However, to be fair, we’ve changed the result set. If you need the columns in question, JobTitle and HireDate, this is a horrible solution.

Take Advantage of Index Joins

I introduced the concept of index joins in Chapter 10. As stated there, these are somewhat rare, so counting on them could be problematic as a tuning strategy. For example, the original code we’ve been testing with throughout the chapter, Listing 11-4, cannot be tuned with an index join. This is primarily because there is no filtering on the other columns, so indexes on those columns just won’t help performance at all. To see this in action, we’ll take a new example from Listing 11-10.
SELECT poh.PurchaseOrderID,
       poh.VendorID,
       poh.OrderDate
FROM Purchasing.PurchaseOrderHeader AS poh
WHERE VendorID = 1636
      AND poh.OrderDate = '2014/6/24';
Listing 11-10

Query against the PurchaseOrderHeader table

This query runs with the following performance metrics and the execution plan in Figure 11-11:
614mcs
10 reads

A diagram depicts both index seek and key lookup with costs of 25 and 75 percent, points to nested loops. Then points to select, both with 0 percent cost.

Figure 11-11

Query resulted in a key lookup

As with the other queries in this chapter, the columns in the query are not contained in any of the nonclustered indexes on the table. This means that even though the IX_PurchaseOrderHeader_VendorID index filters the data based on the WHERE criteria, the rest of the columns in the query have to be retrieved from the clustered index.

As we’ve shown earlier, you could modify the index and add columns to make it a covering index. However, that does change the index, either the key or just the overall size with columns at the leaf level. What if modifying the existing index negatively impacted other queries?

We can experiment to see if an index join is possible by adding the index in Listing 11-11.
CREATE NONCLUSTERED INDEX IX_TEST
ON Purchasing.PurchaseOrderHeader (OrderDate);
Listing 11-11

Creating a new nonclustered index

Running Listing 11-10 for a second time, we get new performance metrics and a new execution plan in Figure 11-12:
4.66mcs
4 reads

A diagram of a flow of 2 indexes seeks 4 of 4 and 18 of 18, with a cost of 27 percent, to merge join as an inner join, with a cost of 46 percent. Then points to select, with a cost of 0 percent.

Figure 11-12

An index join works as a covering index

In the execution plan, both indexes are used to seek against. Because the data is ordered in both seek operations, a merge join is used to put together the one matching row. The reads went from 10 to 4, and you saw a similar improvement in performance.

Changing the index IX_PurchaseOrderHeader_VendorID to be covering would perform even better than this, since it eliminates the join operation as well as the reads against the second index. However, as we discussed, we’re avoiding it in this case.

Summary

As you can see from the code and examples in this chapter, there are mechanisms for dealing with lookups, whether a heap or a clustered index. Lookup operations are not free, so they can be a good choice to help improve the performance of a given query. Analyzing the lookup itself is easy since the lookup operation tells you what you need to know. Then it’s just a question of picking one of the possible solutions to fix the issue.

In the next chapter, we’re going to discuss a somewhat controversial topic: index fragmentation. We’ll talk about whether or not fragmentation is a problem and if it becomes a problem, how best to deal with it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.103.30