Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7. Execution Plans

Elizabeth Noble¹

(1)

Roswell, GA, USA

When I first started writing T-SQL, I focused on using the right syntax. I worked to try memorizing things like if the GROUP BY clause came before the ORDER BY clause. As I became more familiar with writing T-SQL, I had new challenges like making sure that my results were accurate. Over time, I got to the point that not only was I writing queries with complex logic, I also had to write several simpler queries to verify my results. I got to the point where I started debugging T-SQL code that had been written by other database developers. I also became responsible for researching various application issues.

In many cases, the result of this research came down to some T-SQL code that was running at the same time as an application crash or a spike in hardware usage. At this point in time, the pressure was on to figure out what was causing the performance issue as quickly as possible. In the beginning, I learned to rely on my understanding of the needs of the business and the database design. While this can be very helpful, it does require a significant amount of domain knowledge that can take years to acquire. There are better, faster, and easier ways to find the pain points in your queries.

In this chapter, you will learn how to view and understand SQL Server execution plans. You will also see demonstrations on how your existing and new indexes can affect your execution plans. You will also learn about how SQL Server will join data together in the execution plan. By the end of this chapter, you will have the knowledge to know how to use execution plans to improve the performance of your T-SQL.

Reading Execution Plans

My ultimate goal when writing to T-SQL is to make sure that I get accurate results. In some cases, the queries I write are simple and straightforward. As a data engineer, I find myself needing to performance tune queries related to reporting or writing new queries for reporting. Because these are queries to provide reports on transactional data, I often must incorporate more joins and more logic than I would need if I were to write queries for application. While this is not ideal, this is a very common scenario for many companies out there.

Many companies have begun to realize the importance of the data in their systems, or they want additional monitoring to ensure the applications are working as intended. In either case, it is highly likely that you will come across a query in your environment that is negatively affecting your applications. At that time, you will want to know what you can do to resolve the issue. In this section, you will learn why execution plans are important and when you need to use them. You will learn how to read execution plans including commonly used symbols. You will also learn about properties and warnings in execution plans.

My first mentor once told me that in order to write good T-SQL, I needed to understand how SQL Server worked. It is some of the wisest advice I have received, and I am still working to understand SQL Server internals better each day. Taking this advice in mind, one way that we can have a better understanding of SQL Server is to understand how SQL Server will execute any queries that we have written. While I will not cover all the internals related to executing a query, I will show you the instructions that SQL Server has determined should be used to execute your T-SQL code. These instructions can be found in the execution plan for as long as the plan is still in the plan cache.

Once you have decided that you want to start investigating an execution plan for some T-SQL code, you should confirm that you have access in SQL Server to view execution plans. You have several options. If you are going to be getting the actual execution plan when the T-SQL code is executed, you will need to make sure you have permissions to execute that code and that you have SHOWPLAN access for the databases that exist in the T-SQL code. You can also get the execution plan from the plan cache, but you will need to be able to query the dynamic management view sys.dm_exec_cached_plans. Now that you know how to verify that you have access, I want to go over the different types of execution plans available.

There are various ways to retrieve execution plans, and depending on your needs at the time will help determine which is the best model to use. When you are trying to troubleshoot performance issues, the best option is to retrieve the execution plan from sys.dm_exec_cached_plans. This is the method you will want to use if you want the best idea of the execution plan that was used previously. You may not be able to find the query or stored procedure here if the execution plan has been cleared from the plan cache. In addition, depending on when the performance issue happened, the plan that was in cache may have been cleared. This may mean you can find an execution plan, but it may not be the same one used previously.

If you find that the execution plan is not in the plan cache or that the execution plan has been changed since the performance issue, you can try to find the current execution plan. There are two options available when using this method. One option is to get a hypothetical idea of what SQL Server would use as an execution plan. This can be done without running the actual T-SQL code on the server. This is called the estimated execution plan. The largest challenge using this method is that just because SQL Server has estimated an execution does not give you any guarantee that this will be the actual execution plan that SQL Server uses when the query is run. I do not usually use estimated execution plans in my day-to-day operations. I have used estimated execution plans only a handful of times, and that was when I was hesitant to run the T-SQL code for fear of causing an outage in Production.

The next best option besides reviewing an execution plan in the plan cache is to see what the execution plan would look like right now if you ran the execution plan. Requesting an actual execution plan when you execute your T-SQL code will cause the execution plan to be available after the query execution completes. There is also a newer option available in SQL Server 2014 and higher, that is, the ability to see the flow of data through the execution plan as the query is executing. Instead of viewing the actual execution plan, you will ask SQL Server to show you the live query statistics. While this option may be helpful in slower queries where there is one specific pain point, I often find that many queries complete too quickly to use this method to effectively diagnose many issues I am trying to resolve.

How and when you retrieve an execution plan are factors that you can control when it comes to working with execution plans. There is also the option of what the execution plan should look like once you retrieve it. My usual method is looking at the graphical output for the execution plan. The process of reviewing execution plans for T-SQL code is the same regardless of whether you are writing new T-SQL or performance tuning some existing code. An example of a graphical execution plan can be found in Figure 7-1.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig1_HTML.jpg — Figure 7-1
Graphical Execution Plan

The graphical execution gives a high-level view of how a query is executed. This makes the execution plan easier to read at a glance, but it also means that there is additional information that needs to be accessed when getting into the details in the execution plan. Figure 7-2 shows some details that can be viewed when putting your mouse over various objects in the execution plan.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig2_HTML.jpg — Figure 7-2
Operator Details from a Graphical Execution Plan

In addition to the properties in this tool tip, in newer versions of SQL Server Management Studio, you also have access to even more extended properties as shown in Figure 7-3.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig3_HTML.jpg — Figure 7-3
Additional Properties for Execution Plans

There is an option to interact with the execution plan in an XML format. You can see an example of this in Listing 7-1.

<?xml version="1.0" encoding="utf-16"?>

<Batch>

</OutputList>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValue>

</DefinedValues>

</IndexScan>

</RelOp>

</QueryPlan>

</StmtSimple>

</Statements>

</Batch>

</BatchSequence>

</ShowPlanXML>

Listing 7-1

XML Execution Plan

While this method may be more difficult to read, it does contain all information related to the execution plan. The query that generated this table was a SELECT statement on one table. As you can also see, having all the data in one place makes for quite a bit of code that needs to be reviewed.

Once you have set up SQL Server Management Studio to give you an execution plan, you will want to know what to look for when reviewing the execution plan. You will need to know how to interpret the flow in the execution plan to better understand how SQL Server will execute the query. There are also some general shapes you will want to be familiar with as they may help you find potential issues at a glance. Included in the execution plan you may find additional text that quickly shows issues that have been determined to be negatively affecting the execution of your query. Having a better understanding of all these elements will help you figure out the pain points for your queries so that you can determine where to focus your efforts.

Once you have an execution plan, you can get a better idea of exactly how SQL Server will execute that query. When we read English, we are accustomed to reading from left to right and from top to bottom. To compare this to execution plans, the uppermost left corner represents the result of the query. If we compare this to reading a book, as you move from left to right, you are drilling down into steps that SQL Server took to get the result set. The same is true when moving from top to bottom. Looking at the example in Figure 7-4, you can see an execution plan with multiple steps.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig4_HTML.jpg — Figure 7-4
Execution Plan with Multiple Steps

If I read from left to right, I can see the representation for the results returned with the SELECT. When I move to the right, I can see that SQL Server used the compute scalar immediately before getting the result set. This means that if I move to the right-most object, this is the first action SQL Server took when executing the query. In this case, the first step SQL Server performed when executing this query is a clustered Index Scan. This is important to know because you may hear references to reading execution plans left to right. In these cases, the person is referring to reading the execution plan in the same way SQL Server executes the query.

When I first started working with execution plans, I was quickly overwhelmed with the amount of information available in them. I wanted to understand everything from the beginning. However, this was far too much for me to remember. Through my years of troubleshooting any issues, I learned to start my analysis by finding differences. With the execution plans, it did not take long to quickly see difference right away. I noticed that the arrows between two objects did not always stay the same size. There are two different arrows side by side in Figure 7-5 for comparison.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig5_HTML.jpg — Figure 7-5
Arrows from Execution Plans

I learned that the difference in the thickness of the lines indicated the relative amount of records SQL Server was accessing for each step. While there are some queries that are going to return more records than others and therefore have thicker lines, you will find execution plans where one step has a very thick line and the next step will have a thin line. This usually indicates the step with the thicker line could be written more efficiently in order to pull back less data. Looking at the thickness of the lines is one of the things you can check when you look at execution plans.

There are times when you will want your data ordered in a specific way. For instance, you may want to show a list of recipes alphabetically or by creation date. The recipes can either be sorted within the application code or within SQL Server. If you choose to order the data within SQL Server, you may see a Sort operator in your execution plan. You can see an example of what the Sort operator looks like in Figure 7-6.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig6_HTML.jpg — Figure 7-6
Sort Operator

You will see Sort operators in your execution plans. There are occasions where the Sort operator is necessary. However, you will want to look for instances where the Sort operator may not be needed. This can include Sorts that are required for Merge Joins. In these cases, you will want to see if there is an index that can be used for the join that already has the data sorted. One of the largest issues to watch out for is when there is a warning related to the Sort operator.

One of the most helpful features of execution plans is when they return a warning. Ideally, you want to write T-SQL that performs well. This also means the T-SQL should not return execution plans with warnings. There will be times when you get a warning as part of your execution plan. The warning will look like Figure 7-7.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig7_HTML.jpg — Figure 7-7
Warning in an Execution Plan

The main advantage of seeing a warning is that SQL Server has identified a potential issue with your T-SQL code. It would benefit you to investigate any warnings that are returned and see if you can take steps to resolve them.

Execution plans have other elements that will alert you to potential performance issues. While warnings can give you a specific advice about issues with your T-SQL, there are other items in your execution plan that do not provide the same level of guidance. When creating an execution plan, SQL Server uses statistics to make a guess as to the number of rows that will be returned. SQL Server will also keep track of the actual number of rows returned in each step of the execution plan. If you mouse over the operators, you will be able to see something like what is in Figure 7-8.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig8_HTML.jpg — Figure 7-8
Estimated vs. Actual Number of Rows

The first arrow points to the actual number of rows returned for this step of the execution plan. The second arrow points to the estimated number of rows returned. In this case, the number of rows matches. If the number of rows matches or is close in range, you will not have performance issues related to estimates. If the two numbers are significantly different, then you may get an execution plan that does not perform as well as it could. This could be happening because the statistics are out of date, and SQL Server cannot accurately determine the number of rows that will be returned. Another possibility is that SQL Server may have created the execution plan for a different set of values. Regardless of the reason, if you see a large variation between the estimated and actual number of rows returned, this is something that you should investigate.

By reading execution plans, you can quickly identify where there are performance issues in your queries and what steps you can take to solve those issues. This includes knowing how to access execution plans and know the difference between estimated and actual execution plans. Execution plans also have several features that will help you quickly identify possible performance issues. By looking at the size of the arrows, you can tell the relative amount of data passing through each step in an execution plan. The Sort operator may provide insights into data that is sorted for more reasons than returning an ordered result set. You can also look at the estimated and actual number of rows returned to help determine if SQL Server has enough information to give you a good execution plan. Once you have gotten familiar with these aspects of execution plans, you may want to investigate how SQL Server is using your indexes.

Index Usage in Execution Plans

In terms of managing T-SQL, one of the largest differences I have seen across companies is how indexes are managed. In some cases, the development teams own writing all the T-SQL code including index creation. There are also companies where the database team handles all index creation and maintenance. Since database administrators usually have less access to development than the developers, I believe it may be most helpful to have the two teams work together. The database administrators can find indexes that are performing poorly and may have more experience designing indexes for multiple T-SQL queries. On the other hand, the developers may have a better idea of how the tables are designed and how the T-SQL code uses those tables.

There are many queries, health checks, and monitoring tools that exist that can help you identify issues with your indexes. You can either use queries to find what stored procedures use the same database objects or you may have a specific query in mind to improve performance. Once you get the execution plan from the cache or generate the current actual execution plan, you can start looking into potential performance issues. As discussed in the previous section, you know that you can check the relative amount of data records for each step, look at various operators and warnings, and verify the estimated and actual number of rows returned. You can also look at how indexes are being used as part of your execution plan.

When working with data in SQL Server, the preference is that data is found as quickly as possible. When data is saved in a clustered index, the data is sorted based on the columns of the index. This means that when SQL Server searches through a clustered index, the data is stored in order. In the best scenario, SQL Server would look through the data and find the data quickly. If this happens, you will see a Clustered Index Seek in the execution plan. The Clustered Index Seek in the execution plan will look like Figure 7-9.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig9_HTML.jpg — Figure 7-9
Clustered Index Seek

Seeing a Clustered Index Seek lets you know that your T-SQL code is written well enough to find the data quickly. If you do not see a Clustered Index Seek in your execution plan, you may want to see if there is a way to rewrite your T-SQL to use a Clustered Index Seek.

While it is ideal for your data to use a Clustered Index Seek , you may see something in your execution plan that looks similar but not the same. It is possible for SQL Server to use the clustered index but not find the data quickly. In this case, SQL Server may need to look through a significant percentage of the index. If this happens, you may see a Clustered Index Scan in the execution plan as shown in Figure 7-10.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig10_HTML.jpg — Figure 7-10
Clustered Index Scan

This does mean that SQL Server looked through the data as sorted by the clustered index but had to scan through the table to find all the records needed to satisfy the query requirements.

SQL Server uses the term seek to indicate that the requested data could be found without searching a significant portion of the reference index. When the entire table or a significant part of the table needed to be searched, SQL Server refers to this as a scan. While seeks or scans can apply to clustered indexes, they can also apply to non-clustered indexes. Non-clustered indexes are not sorted in the same order as the clustered index or the table, if the table is a heap. The non-clustered index is sorted in the order of key columns specified in the index. If there is a clustered index on the table, the non-clustered index will have a pointer back to the clustered index. If there is no clustered index, then the non-clustered index will point back to the row ID in the table.

If the execution plan uses a non-clustered index, there are a couple of different ways the SQL Server can search through the data records. If SQL Server knows where to find the data in the non-clustered index, you will see an Index Seek in the execution plan. The Index Seek will look like Figure 7-11.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig11_HTML.jpg — Figure 7-11
Index Seek

If a Clustered Index Seek cannot be used based on the T-SQL code that is written, the next best option is an Index Seek. If a seek is not possible, SQL Server ends up using an Index Scan. Like the Clustered Index Scan, this means that SQL Server needed to go through the index to find the necessary data records. If the execution plan uses an Index Scan, Figure 7-12 shows how the Index Scan will appear in your execution plan.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig12_HTML.jpg — Figure 7-12
Index Scan

Also, like Clustered Index Scans , if you see an Index Scan, you may want to see what can be done in terms of rewriting T-SQL to use an Index Seek instead.

Keep in mind what kind of data is needed when writing your T-SQL queries. While including additional data fields can have additional hardware overhead, it can also affect how SQL Server searches the data records. If SQL Server needs to use an Index Seek or an Index Scan to find the required data records, that does not mean that all the data fields need to exist in the non-clustered index. If additional data columns are required, SQL Server may need the non-clustered index to get the additional data columns from the clustered index. When this happens, you will see a Key Lookup in your execution plan like the one in Figure 7-13.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig13_HTML.jpg — Figure 7-13
Key Lookup

When you see a Key Lookup, that lets you know that not all of the information needed for your query existed in the non-clustered index. In these scenarios, it signifies that the index may need to be modified to include those columns. Before deciding to add these data fields as included columns, you should be aware that including the columns in the index may have an additional cost associated with writing or updating the index.

A table that does not have a clustered index is also known as a heap. If there are no other indexes on the table, SQL Server will have to search row by row to find the data records it needs for query results. There are many other factors that can be involved, but this can cause SQL Server to scan the entire table. This can be seen in the execution plan as Figure 7-14.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig14_HTML.jpg — Figure 7-14
Table Scan

Some tables that are very small may not need to have indexes, and using a Table Scan may be acceptable. If there are many records in the table, then a Table Scan is not ideal. When this is the case, you may want to investigate if there are any possible indexes that can be added to the table.

If there is a non-clustered index on the heap, it is possible that SQL Server can use those non-clustered indexes as part of the execution plan. If this is the case, you will see an Index Seek or an Index Scan. When either an Index Seek or Index Scan is in the execution plan, it is possible that the columns used in the query do not exist in the non-clustered index. If additional columns need to be looked up in the table, you will see a RID Lookup in the execution plan. You can see this in Figure 7-15.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig15_HTML.jpg — Figure 7-15
RID Lookup

Like the Key Lookup, you will want to see if you can alter any non-clustered indexes to include the necessary columns from the T-SQL code. It is also possible that the query can be modified to exclude the columns that are causing the RID Lookup.

As an example, I have written a query to pull back all ingredients that begin with the name tomato. In order to do this, I will need to run the query in Listing 7-2.

SELECT IngredientName, DateCreated

FROM dbo. Ingredient

WHERE IngredientName LIKE 'Tomato%';

Listing 7-2

Query to Get Ingredients

The first time I ran this query, there was only a primary clustered index on IngredientID in this table. The execution plan from the T-SQL code in Listing 7-2 is shown in Figure 7-16.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig16_HTML.jpg — Figure 7-16
Execution Plan with Only Clustered Index

There are no non-clustered index on the table. The only option SQL Server has is to search through the entire clustered index to find the requested records. You can see this by looking over the properties associated with the Clustered Index Scan. The properties for this query execution can be found in Figure 7-17.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig17_HTML.jpg — Figure 7-17
Reads with Clustered Index

Looking at the estimated and actual number of rows to be read, the rows to be read are the same for both. The total for each was 263,010 rows read. The actual number of rows is 3. SQL Server must search 263,010 rows to find the 3 rows that matched the criteria.

In this scenario, we have a query that is reading significantly more rows than the number of rows returned. I cannot rewrite the query to use a non-clustered index since none exist. My best option is to write a non-clustered index to improve performance. The new non-clustered index is Listing 7-3.

CREATE INDEX IX_Ingredient_IngredientName

ON dbo.Ingredient (IngredientName);

Listing 7-3

Create Non-clustered Index

Now that I have created the non-clustered index, I am going to run the query again to see if I get a new execution plan. The new execution plan is in Figure 7-18.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig18_HTML.jpg — Figure 7-18
Execution Plan with Non-clustered Index

In Figure 7-18, the Clustered Index Scan has been replaced with an Index Scan and a Key Lookup. While this may seem like I have decreased the performance of the execution plan, I can verify this by looking at the number of rows read. In Figure 7-19, you can see the properties associated with the Index Seek.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig19_HTML.jpg — Figure 7-19
Properties for Index Seek

Now that the non-clustered index has been created, I can see that the actual number of rows read has decreased from 263,010 to 3. I would expect that the Key Lookup would have the same number of reads, but I can look at the properties to confirm. In Figure 7-20, you can see the properties returned.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig20_HTML.jpg — Figure 7-20
Properties of the Key Lookup

Reviewing the Key Lookup, there were also three rows read in the Key Lookup. The T-SQL code without the non-clustered index had reads of 263,010. The total number of rows read with the non-clustered index is six.

I could get rid of the Key Lookup by including the DateCreated column as part of the non-clustered index or remove the column from the original query.

CREATE INDEX IX_Ingredient_IngredientName

ON dbo.Ingredient (IngredientName) INCLUDE (DateCreated);

Listing 7-4

Non-clustered Index with Included Column

After adding the index in Listing 7-4 and running the query from Listing 7-2, I can get the execution plan. The execution plan can be found in Figure 7-21.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig21_HTML.jpg — Figure 7-21
Execution Plan with Non-clustered Index and Included Column

The execution plan is showing that the query is still using the non-clustered index. Since this index has changed to include the additional column in the select statement, the execution plan has changed. Previously, with the non-clustered index, the execution plan had an Index Scan and a Key Lookup. Now the execution plan is using an Index Seek.

Once you get familiar with reading execution plan, you can start using execution plans to help you determine how to improve performance. When you look at your execution plan, you may find a Clustered Index Scan. Generally, this implies that SQL Server had to look through the entire table to find the data that meets the query criteria. As shown in this section, creating a non-clustered index that includes the join criteria can help improve the performance of your query. If you find that SQL Server is having to perform a Key Lookup, you can see if it makes sense to include the columns being selected as part of the non-clustered index. Understanding how SQL Server uses indexes can help improve your query performance. Understanding how SQL Server compares data from more than one data set can also help you improve query performance.

Logical Join Types in Execution Plans

In addition to reading execution plans and understanding indexes, your T-SQL code can also be used to determine how an execution plan will be generated. In some cases, how tables are joined together is just as important as how data is stored in the tables. There are times where how you are joining your columns together may also affect your execution plan. Some T-SQL in the WHERE clause can impact what types of logical joins are used in the query execution. If multiple queries are combined, it is also possible that it may change how the query is executed. Ultimately, there is a relationship between how the T-SQL code is written and how SQL Server decides to execute the query.

There are several different logical joins in SQL Server. Some of these logical joins are easily visible in T-SQL. This includes the INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. These are not the only logical joins that exist in SQL Server. With some T-SQL commands, SQL Server will compare one table to another looking for records that match or not. In these scenarios, SQL Server does not perform a full join but a SEMI JOIN. This type of join is not specified as a T-SQL command, but some T-SQL commands like EXISTS or NOT EXISTS will indicate a SEMI JOIN. Available semi joins include the LEFT SEMI JOIN, LEFT ANTI-SEMI JOIN, RIGHT SEMI JOIN, and RIGHT ANTI-SEMI JOIN. Finally, there are some logical joins that are related to combining two or more query results in the same transaction. This can include CONCATENATION which is often associated with a UNION ALL or the UNION logical join which can happen with a UNION in T-SQL. Based on the logical join, SQL Server will determine what physical joins can be used. In this way, logical joins can be used to affect query performance.

There are physical join operators that can be associated with logical joins. The four physical join operators are called merge join, hash join, nested loop, and adaptive join. When using an INNER JOIN, any of these four types of physical join operators can be used as part of the execution plan. Depending on how the data is stored in the table and the relative sizes of the tables may affect which physical operator is used. If the data from both tables being compared is sorted, then the records from the two tables can be compared side by side. This would let SQL Server quickly find the records that do or do not match, depending on the query requirements. When this happens, it is called a Merge Join. A Merge Join physical operator that would appear in the execution plan as shown in Figure 7-22.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig22_HTML.jpg — Figure 7-22
Merge Join

There are times where the tables being joined together are not sorted. If that is the case, SQL Server can still compare the rows between each of the tables. However, SQL Server will need to convert the records to something that can be compared easily. This can be done by hashing. If SQL Server hashes the columns being compared in both tables and compares them, the physical operator will be a Hash Match. If you see Figure 7-23 in your execution plan, then you know how SQL Server is performing a Hash Match physical join.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig23_HTML.jpg — Figure 7-23
Hash Match

If there are two tables where one table is smaller than another, then SQL Server may decide to compare the values or one table one row at a time to all rows of the other table. When this happens, SQL Server will be using a Nested Loop as shown in Figure 7-24.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig24_HTML.jpg — Figure 7-24
Nested Loop

Starting in SQL Server 2017, there is a new physical join operator that can be used when SQL Server determines an execution plan for a query. This new physical join operator is intended to help in situations where the data stored in a given table may vary significantly based on the criteria in the WHERE clause . This type of physical join allows SQL Server to determine if a Hash Match or Nested Loop should be used depending on the data selected for the query. This is referred to as an Adaptive Join. In SQL Server 2017, Adaptive Joins were only available with columnstore indexes. This has changed for SQL Server 2019.

Going through the logical joins that are available, the next one on the list is the LEFT OUTER JOIN and the RIGHT OUTER JOIN. In some ways, these two types of logical joins are the same but also different. For instance, Nested Loops can only use a LEFT OUTER JOIN. If a query is written using a RIGHT OUTER JOIN, SQL Server will incur an additional cost to convert a RIGHT OUTER JOIN to a LEFT OUTER JOIN. Except this specific scenario, all physical join types support the use of LEFT OUTER JOIN. In previous versions of SQL Server, it was possible for SQL Server to generate two different execution plans depending on whether a LEFT OUTER JOIN or a RIGHT OUTER JOIN was used in the query. This was due to the limitations around how joins can be reordered.

The next type of logical join is the FULL OUTER JOIN. Of all the join types, this one has the most specifics about when one physical join type may be used over another. In the first example, I am going to order the table by the identifier as shown in Listing 7-5.

ALTER TABLE dbo.RecipeIngredient

ADD CONSTRAINT PK_RecipeIngredient_RecipeIngredientID

PRIMARY KEY CLUSTERED (RecipeIngredientID)

Listing 7-5

Primary Key to Sort Data by RecipeIngredientID

Once the table has been ordered by the RecipeIngredientID, I will join the RecipeIngredient table to the Recipe table. The T-SQL code for this join is shown in Listing 7-6.

SELECT rec.RecipeName

FROM dbo.Recipe rec

LEFT OUTER JOIN dbo.RecipeIngredient recing

ON rec.RecipeID = recing.RecipeID

Listing 7-6

Outer Join Between Recipe and RecipeIngredient

In this scenario, the two tables are not ordered in the same way. The RecipeIngredient table is sorted by the RecipeIngredientID, whereas the Recipe table is sorted by the RecipeID. You can see in Figure 7-25 that the physical operator is a Hash Match.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig25_HTML.jpg — Figure 7-25
Execution Plan Using Hash Match

We have seen what happens when the tables are ordered differently. If I run the T-SQL code in Listing 7-7, both tables will be ordered by the same column, RecipeID.

ALTER TABLE dbo.RecipeIngredient

ADD CONSTRAINT PK_RecipeIngredient_RecipeIDIngredientID

PRIMARY KEY CLUSTERED (RecipeID, IngredientID)

Listing 7-7

Change Primary Key to Sort Data by RecipeID

Looking at Figure 7-26, you can see that the execution plan has changed to use a Merge Join.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig26_HTML.jpg — Figure 7-26
Execution Plan Using Merge Join

Now that these two tables are ordered in the same manner, SQL Server has determined that it is more cost-effective to use a Merge Join. In the preceding example, SQL Server will take the two ordered data sets and compare the records to determine which records match. All the preceding examples assume equality between the columns being compared. As a general guide, SQL Server will always use a Nested Loop if none of the joins in the query are equal. You can see an example of query that meets this criteria in Listing 7-8.

SELECT TOP 500 rec.RecipeName

FROM dbo.Recipe rec

FULL OUTER JOIN dbo.RecipeIngredient recing

ON rec.RecipeID <> recing.RecipeID

Listing 7-8

Full Outer Join with Inequality

If I re-create the primary key referenced in Listing 7-5, the data between the two tables will be sorted differently. Due to the amount of data that will be compared between the two tables, I decided to limit the result to the first 500 records. After running this query, I get the execution plan returned in Figure 7-27.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig27_HTML.jpg — Figure 7-27
Execution Plan Using Nested Loop

Now that the only join in the T-SQL code is an inequality, SQL Server uses a Nested Loop physical join. Almost every query where there are no joins that are equal must use a Nested Loop except one specific scenario. If the data being compared is sorted and there is an inequality, SQL Server can use a Merge Join. Creating the primary key from Listing 7-7 will order the data the same between the two tables. The execution plan in Figure 7-28 is returned when executing the query from Listing 7-8.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig28_HTML.jpg — Figure 7-28
Execution Plan Using Merge Join

Figure 7-28 shows us that it is possible to have a Merge Join when there is an inequality. However, only when the tables are sorted and there is an inequality is where you can see a Merge Join with a FULL OUTER JOIN. In addition to T-SQL joins that relate to logical joins, there are other logical join types that SQL Server can use.

Another logical join type involves situations where SQL Server compares data between two tables but without doing a full join between the tables. Depending on the T-SQL code used, this can be referred to as a SEMI JOIN or an ANTI SEMI JOIN. Like the LEFT OUTER JOIN and RIGHT OUTER JOIN, SQL Server has some of the same limitations matching the SEMI JOIN and ANTI SEMI JOIN to physical join operators. When working with either a SEMI JOIN or an ANTI SEMI JOIN, there is a concept of left or right. This left or right has to do with which side is being compared. Listing 7-9 finds all ingredients that are not in the recipe with an ID of 2.

SELECT ing.IngredientName

FROM dbo.Ingredient ing

WHERE NOT EXISTS

(

SELECT *

FROM dbo.RecipeIngredient recing

WHERE ing.IngredientID = recing.IngredientID

AND recing.RecipeID = 2

)

Listing 7-9

All Ingredients Except Those in RecipeID 2

When this query was run, the data in RecipeIngredient was sorted by RecipeID, then IngredientID. In Figure 7-29, the execution plan shows a LEFT ANTI SEMI JOIN.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig29_HTML.jpg — Figure 7-29
Execution Plan with Left Semi Anti Join

The semi anti join is due to the NOT EXISTS in the query from Listing 7-9. In the preceding scenario, SQL Server has chosen a left join. Since the table RecipeIngredient is ordered by RecipeID and then IngredientID and the T-SQL code is only looking at one RecipeID, the values evaluated are IngredientIDs that are ordered in the same way as the Ingredient table. This is what allows SQL Server to use a Merge Join for this query. In the query in Listing 7-10, I am looking for all ingredients that exist in a certain subset of recipes.

SELECT ing.IngredientName

FROM dbo.Ingredient ing

WHERE EXISTS

(

SELECT *

FROM dbo.RecipeIngredient recing

WHERE ing.IngredientID = recing.IngredientID

AND recing.RecipeID < 200

)

Listing 7-10

All Ingredients in All Recipes up to RecipeID 200

As shown in execution plan in Figure 7-30, you can see a RIGHT SEMI JOIN.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig30_HTML.jpg — Figure 7-30
Execution Plan with Right Semi Join

The use of the RIGHT SEMI JOIN is due to the EXISTS in the T-SQL from Listing 7-10. The use of right is stating the SQL Server is evaluating the results returned from the EXISTS statement to the list of ingredients. While the use of EXISTS or NOT EXISTS can indicate that there will be a SEMI JOIN or an ANTI SEMI join, that may not always be what happens. Listing 7-11 is a query that will return all ingredients that exist for the recipe with an ID of 2.

SELECT ing.IngredientName

FROM dbo.Ingredient ing

WHERE EXISTS

(

SELECT *

FROM dbo.RecipeIngredient recing

WHERE ing.IngredientID = recing.IngredientID

AND recing.RecipeID = 2

)

Listing 7-11

All Ingredients in RecipeID 2

The execution plan in Figure 7-31 shows how SQL Server has determined to execute this query.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig31_HTML.jpg — Figure 7-31
Execution Plan with No Semi Joins

For this execution, SQL Server has decided to convert this T-SQL to act like an INNER JOIN instead of a SEMI JOIN. This is because SQL Server has calculated that this will give the best execution out of those evaluated.

There are ways to combine data that consist of comparing values between two data sets and selecting items that match or do not match between them. There is also the possibility of combining entire data sets as part of the logical join types. This can include the use of UNION or UNION ALL in your T-SQL code. The query shown in Listing 7-12 shows the use of a UNION between the two tables.

SELECT ing.IngredientName

FROM dbo.Ingredient ing

WHERE ing.IngredientName LIKE 'Tomato%'

UNION

SELECT ing.IngredientName

FROM dbo.Ingredient ing

WHERE ing.IngredientName = 'Tomato'

Listing 7-12

All Ingredients with Tomato

In this query, the list of ingredients that begin with tomato or the ingredient are equal to tomato will be returned. When this query is executed in the SQL Server, the execution plan in Figure 7-32 is returned.

../images/480547_1_En_7_Chapter/480547_1_En_7_Fig32_HTML.jpg — Figure 7-32
Execution Plan with Concatenation

In the case of this query, SQL Server is not joining the data together as it has in the other examples from this chapter. Instead, there is only one operator, Concatenation, that indicates how SQL Server is combining this data.

Overall, the goal is to use the physical join that is the most effective for the data that is being joined. Merge Joins can perform very well, but they are limited to data that has been ordered by indexes or ORDER BY or GROUP BY statements. As expected, Index Merge Joins will perform even better than Merge Joins that do not use an index. When data is not sorted and particularly if one of the two sides of the data being joined is small, SQL Server may use a Nested Loop. You will want to confirm that the cost of looping through the other table does not incur a significant cost. Just as a Merge Join using an index performs better than a Merge Join without an index, the same is true for a Nested Loop. For both a Merge Join and Nested Loop, see if you can alter your T-SQL code to use an index. This does not mean creating an index if one does not exist; instead, you can check the indexes on the table and see if any will apply for your specific query. If the tables are not sorted and particularly if the numbers of records being joined on either side are both large, the Hash Match can be the ideal solution.

Throughout this chapter, we have gone over various aspects of working with execution plans. We started with some of the ways you can access and view execution plans. We also discussed how estimated execution plans, actual execution plans, and executions in the plan cache differed. When looking at execution plans, there are some items such as arrow sizes, estimated number of rows, and actual number of rows that can give you some hints regarding the next steps you can take to improve the performance on your T-SQL code. After looking into these items, you may also be able to check how the execution is using indexes. It is helpful to not only focus on what indexes are being used but how SQL Server is searching through those indexes. When joining tables, you will use T-SQL code that can reference certain logical join type behavior. These logical join types can affect what types of physical join types will be used as part of the execution plan. By writing your T-SQL code differently, you may be able to impact the execution plan generated by SQL Server. Using all the information covered in this chapter should help you get more comfortable with reviewing your T-SQL code and improving the speed and hardware usage associated with your T-SQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Execution Plans

Create new playlist

Sign In

Sign Up

7. Execution Plans

Reading Execution Plans

Index Usage in Execution Plans

Logical Join Types in Execution Plans

Table of Contents for
7. Execution Plans