Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

G. FritcheySQL Server 2022 Query Performance Tuninghttps://doi.org/10.1007/978-1-4842-8891-7_2

2. Execution Plan Generation and the Query Optimizer

Grant Fritchey¹

(1)

Grafton, MA, USA

All performance within SQL Server starts, and frequently ends, with the query optimizer. Because the optimizer is so important to query performance, we’re diving immediately into how the optimizer behaves in order to better understand why our queries behave the way they do. The optimization process results in an execution plan, a mechanism for showing the choices the optimizer made. The optimization process is expensive in terms of resources, especially CPU, and can be somewhat time-consuming, affecting the execution time of your queries. We’ll go over what the process is, how an execution plan is generated, and how that plan goes into memory in the plan cache in the hopes of reusing the plan.

In this chapter, we’ll address the following topics:

The query optimization process in general
Execution plans and the plan cache
Factors affecting parallel plan generation

The Query Optimization Process

The query optimizer in SQL Server uses a cost-based analysis mechanism to arrive at the necessary choices to satisfy the queries that you’ve submitted. In order to process a query, a large number of choices must be made. Which index to use to access a table, how to join two tables together, and whether or not the data has to be ordered, all these and many others are all given costs by the optimizer. The optimizer is also aware of the structure of the database. From the data types in your columns to your primary key definitions, constraints, and foreign keys, these objects also affect the cost estimates that the optimizer uses to arrive at its decisions. Finally, your data is included in the cost through the use of statistics about the data (covered in more detail in Chapter 5).

As was mentioned in Chapter 1, the optimizer doesn’t attempt to make a perfect execution plan. Instead, the optimizer uses mathematical models, based in part on the objects and statistics in your database, to arrive at a plan that is good enough. So the cost-based analysis tries to arrive at as optimal a plan as it can while using as few resources as it can and finishing up as quickly as it can.

Before optimization starts though, there are some initial, preparatory phases that must be accomplished.

Optimization Preparation

Because the optimizer needs so much metadata about your system, there are a number of steps before optimization takes place to put this information together. The order in which these steps are performed is as follows:

1.
Parsing
2.
Binding
3.
Optimization

There is a complex set of inputs and outputs that don’t really need to be explored in order to understand what’s going on. However, just so we’re clear, Figure 2-1 shows the process, the inputs, and outputs:

Figure 2-1
Steps leading up to the optimization process

Let’s take a look at the two steps that take place prior to the optimization process itself.

Parsing

A query first goes through the relational engine within SQL Server. This is the first part of two processes that deal with the query. The second is the storage engine that is responsible for data access, modification, and caching. The relational engine takes care of parsing the query through a process called the algebrizer. The relational engine executes a query per the execution plan generated and requests the data from the storage engine.

The algebrizer does several steps, but the first one is query parsing. The parsing process simply validates that you have correct syntax in your query. If there’s an error in the syntax, the process is immediately terminated, and all optimization processes are interrupted. If multiple queries are submitted together as a single batch, then the batch is checked, and the batch is cancelled in the event of a syntax error anywhere within the batch. While you may have multiple syntax errors, the parser stops at the very first error and stops all other processing. The immediate interruption is just one aspect of how the optimizer attempts to reduce the overhead of the optimization process. Listing 2-1 shows a batch with an error.

CREATE TABLE dbo.Example

(

Col1 INT

);

INSERT INTO dbo.Example

(

Col1

)

VALUES

(1);

SELECT e.Col1

FORM dbo.Example AS e; -- Generates an error because of 'FORM'

Listing 2-1

A batch with a syntax error

Once the parsing process is passed, an internal-only structure called a parse tree gets created and passed on to the next step.

Binding

The parse tree is now used to identify all the objects that make up the query. This list of objects includes tables, columns, indexes, and more. This process is called binding. All the data types being processed are identified. Any kinds of aggregation through GROUP BY are identified. All these objects and processes are put together into another internal structure called a query processor tree.

The algebrizer will also handle implicit data conversions by adding steps to the processor tree. You may also see syntax optimizations occur where the code you passed to SQL Server actually gets changed. If you take the code from Listing 2-2 and capture an execution plan (covered in Chapter 4), you can actually see the changes in syntax.

SELECT soh.AccountNumber,

soh.OrderDate,

soh.PurchaseOrderNumber,

soh.SalesOrderNumber

FROM Sales.SalesOrderHeader AS soh

WHERE soh.SalesOrderID

BETWEEN 62500 AND 62550;

Listing 2-2

The original code before binding

Figure 2-2 shows the change in syntax.

Figure 2-2
Syntax-based optimization at work

Two very distinct changes have been made. First, the BETWEEN statement has been replaced with “>=” and “<=”. Second, a process called simple parameterization has been implemented through the @1 and @2 you see in the figure. We’ll discuss more about parameterization later in this chapter. You can see further evidence of the binding process if we look at the execution plan generated in Figure 2-3.

Figure 2-3
A warning indicator on an execution plan

The exclamation point visible on the very first operator on the left, labeled “SELECT”, is an indication of a warning. Looking at the properties for that operator, you’ll find that the SalesOrderID is actually going through a data type conversion:

Type conversion in expression (CONVERT(nvarchar(23),[soh].[SalesOrderID],0)) may affect "CardinalityEstimate" in query plan choice

This example illustrates several points in addition to syntax-based optimization. First, warnings can sometimes be unclear. In this case, the warning is not actually from the SalesOrderID column as referenced in the WHERE clause of the query. Instead, it’s from a calculated column, SalesOrderNumber. That calculation is converting the SalesOrderID, an integer, into a string and adding other information to it. In the way this calculation is being done, the optimizer recognizes that were this to be used as filtering criteria in a WHERE clause, or JOIN criteria, or even a HAVING clause, it would affect the ability to use statistics and indexes to get good performance. In this case though, the query doesn’t reference the column in any of the filtering criteria, so we can safely ignore the warning. I like this example because it shows just how realistic AdventureWorks really is. A lot of this kind of issue creeps into real-world databases.

Unlike Data Manipulation Language (DML) queries that we’ve seen so far, Data Definition Language (DDL) queries, such as CREATE TABLE, do not get execution plans. There is only one way to run a CREATE VIEW query as an example. Therefore, most DDL doesn’t go through the optimization process. The exception is CREATE INDEX DDL queries. The optimizer can take advantage of existing indexes to create another index more efficiently. So while you’ll never see a CREATE PROCEDURE execution plan, you may see one when creating an index.

Once all the binding operations are complete, another internal structure is created, called the query processor tree. This is where optimization starts.

Optimization

The optimization process is a pretty amazing piece of software. Without a debugger, you can’t really see it in action. However, we can see the results of the process in the execution plans that it creates. We can also use Dynamic Management Views (DMVs) and Extended Events to observe aspects of the optimization process. In this book, we’ll see a lot of the evidence of optimization, but it’s beyond the scope of the book to examine the code with a debugger. We’ll stick to the tools immediately available to us. Let’s start talking about the optimization process.

Figure 2-4 shows the processes that the optimizer uses to attempt to arrive at a “good enough” execution plan as quickly as possible.

Let’s discuss each step in order to understand how the optimizer arrives at an execution plan.

Simplification

The first step in optimization is a process called Simplification. Here the optimizer ensures that all the objects that you reference in your query are actually in use. Statistics and row counts begin to be gathered and used. This step looks at the constraints such as foreign keys. When all this is taken together, you can see the optimizer actually change what is being processed. For example, you may write a query with a four-table join. However, only two tables are referenced in the SELECT list and the WHERE clause. Using the constraints, the optimizer can tell that it doesn’t need the other two tables to satisfy the query and will therefore eliminate them from further processing, simplifying what is being queried—hence the name of this step.

Trivial Plan Match

When a query is extremely simplistic, it can match a pattern for which there are no options, but only one possible execution plan. That situation is referred to as a Trivial Plan. The simplest example is a table with no clustered index, referred to as a heap, and no other indexes. The only way this table can be accessed is through a table scan. Rather than even attempting to build an execution plan, the optimizer looks for the query pattern that matches existing Trivial Plans. When a pattern match is found, optimization, an expensive operation, is skipped entirely. Instead, the Trivial Plan is used to satisfy the query. As stated before, you’ll only see Trivial Plans in extremely simple queries. You can see that a plan is marked Trivial in the properties of the execution plan (discussed in Chapter 4).

Optimization Phases

Once it’s clear that a Trivial Plan won’t be found, the optimizer will have to go through multiple optimization phases to attempt to arrive at a good enough execution plan. It breaks this process down into multiple steps, each one attempting to do as little work as possible. These are called

Search 0 or Transaction: It’s called Transaction since the types of queries that can be satisfied here are generally simple, Online Transaction Processing (OLTP) style queries with few joins and no transformations needed.
Search 1 or Quick Plan: This is where more complex operations occur such as reordering joins and other transformations to arrive at a good enough plan.
Search 2 or Full Optimization: When queries are complex, they end up going through all three optimization steps to arrive at full optimization.

The optimizer can short-circuit these searches, heading straight to Quick Plan or Full Optimization, skipping Transaction, depending on the query, its complexity, and the supporting objects such as indexes and constraints. These phases can each decide on how to perform a particular JOIN operation or access the data through scans or seeks.

The principal driver for most of the decisions within the optimizer are row counts. Row counts generally come from the statistics of the columns in the filtering criteria such as ON, HAVING, and WHERE predicates. With the statistics in hand, the optimizer begins to make choices and calculations based on how much estimated CPU, memory, and I/O are necessary to satisfy the query. These estimates are added up to a total cost for the execution of a query. That cost is not a measure based on your system. Rather it’s an arbitrary calculation made by the optimizer and is purely a mathematical construct.

The optimizer will make a calculation on the number of iterations it needs to run through, attempting different plans, with different strategies. At the same time, it calculates what is likely to be the least cost plan. When it finds the plan that meets all the calculated requirements, it will stop optimization, even if a better plan could be found. This is known as Early Termination of Optimization. You’ll also see the termination of optimization when the optimizer has gone through the calculated number of iterations. At that point, it will end optimization through what is referred to as a Timeout (more on that in Chapter 4).

If a query reaches Search 2, or Full Optimization, it may also be evaluated for conversion from a serial plan to a parallel plan (more on that later in the chapter).

It’s possible to see a lot of the work the optimizer is doing through the use of execution plans. Let’s take the query in Listing 2-3 as an example.

SELECT soh.SalesOrderNumber,

sod.OrderQty,

sod.LineTotal,

sod.UnitPrice,

sod.UnitPriceDiscount,

p.Name AS ProductName,

p.ProductNumber,

ps.Name AS ProductSubCategoryName,

pc.Name AS ProductCategoryName

FROM Sales.SalesOrderHeader AS soh

JOIN Sales.SalesOrderDetail AS sod

ON soh.SalesOrderID = sod.SalesOrderID

JOIN Production.Product AS p

ON sod.ProductID = p.ProductID

JOIN Production.ProductModel AS pm

ON p.ProductModelID = pm.ProductModelID

JOIN Production.ProductSubcategory AS ps

ON p.ProductSubcategoryID = ps.ProductSubcategoryID

JOIN Production.ProductCategory AS pc

ON ps.ProductCategoryID = pc.ProductCategoryID

WHERE soh.CustomerID = 29658;

Listing 2-3

A slightly complex query

If we capture the execution plan for this query (again, covered in Chapter 4), we can see some of the behavior of the optimizer. This plan is hard to read within this context, but I don’t want you to try to understand it all at this point. The plan is shown in Figure 2-5.

Figure 2-5
Execution plan from the complex query

Don’t try to read that plan. It’s much too small. Instead of reading the plan now, we’re going to focus on one spot, the operator on the far left side of the plan. In this case, since it’s a SELECT query, the operator is labeled SELECT. The properties of this operator are all of the information about the execution plan itself. These include information about the optimization process. We can see a subset of those properties in Figure 2-6.

Figure 2-6
Optimizer behaviors on display in the execution plan properties

Some of the information immediately on display includes the following:

CompileCPU: The amount of CPU time taken to compile this plan
Estimated Number of Rows Per Execution: The row count that the optimizer arrived at for this query
Estimated Subtree Cost: The combined estimates on CPU, I/O, and memory arrived at through the optimization process

That’s just a sample of the amount of information gathered here for how the optimizer works to arrive at an execution plan. We can also look to this data to see things like the Optimization Level and the Reason for Early Termination, the top and bottom properties shown in Figure 2-7.

Figure 2-7
Additional properties showing optimizer behaviors

One point worth noting here, when we see the property Optimization Level and a value of “Full” here, it doesn’t necessarily mean that this plan went through Search 2. It only means that it’s not a Trivial Plan.

Another way to observe some of the work by the optimizer is to use the Dynamic Management View (DMV) sys.dm_exec_query_optimizer_info. Named appropriately, this DMV shows aggregated data about the optimization process. An example of how to use this DMV is in Listing 2-4.

SELECT deqoi.counter,

deqoi.occurrence,

deqoi.value

FROM sys.dm_exec_query_optimizer_info AS deqoi;

Listing 2-4

Querying the dm_exec_query_optimizer_info DMV

A partial set of the results from the query are visible in Figure 2-8.

Figure 2-8
Aggregated information about the optimization processes

You can see that the system I’ve run this on has done very few complex optimizations. We can tell that because there have been no Search 2 operations. You can see the other operations that have been done and the estimated time needed to accomplish them.

This information is only of limited use, but it’s good to know where to go to get it in the event that you need to understand some of the processes that the optimizer is going through and the load it’s placing on your systems.

You could also run this code before and after a query in order to see the types of optimizations that occurred on that query. However, you need to do this on an isolated system or you’re likely to see more than just the optimizations for the query you’re investigating.

Generating Parallel Execution Plans

When plans become complex enough, they may benefit from parallel execution. Parallel execution is when more than one CPU is available to SQL Server, so more than one is put to work on a single query. However, parallel execution is an expensive operation, so you really want the query to benefit from it before it goes parallel. In fact, some queries can be executed in a parallel fashion, but they actually run slower because of the process.

When the optimizer is deciding on whether or not a complex query will go parallel, it takes a number of factors into consideration:

CPUs available to SQL Server, if more than one
SQL Server edition
Available memory
Cost Threshold for Parallelism
Query being executed
Rows to be processed
Active concurrent connections
Max Degree of Parallelism

The three most important of these factors are the actual number of CPUs available to SQL Server, the Max Degree of Parallelism, and the Cost Threshold for Parallelism.

If there is only a single CPU or a single vCPU on a NUMA node available to SQL Server, then you won’t see any query go parallel. You can also restrict parallelism and the number of CPUs used by it through the Max Degree of Parallelism setting. Setting that to a value of 1 will eliminate all parallel execution, but that’s not recommended. To control the Max Degree of Parallelism, you have to use a server setting (Listing 2-5).

USE master;

EXEC sp_configure 'show advanced option', '1';

RECONFIGURE;

EXEC sp_configure 'max degree of parallelism', 2;

RECONFIGURE;

Listing 2-5

Changing the Max Degree of Parallelism

This will take effect immediately. Determining the precise setting for this value is frequently of great debate. Your systems and your workloads should drive these decisions, not some arbitrary calculation based on the number of processors on your systems. Because of this, I’m not going to make a precise determination here, other than to emphasize again, parallel execution of some queries is very desirable, so you want to leave the possibility on your systems, meaning don’t set the Max Degree of Parallelism to 1.

You can also control the degree of parallelism on a query-by-query basis through query hints. I’m going to say this multiple times in the book: the use of query hints should be very carefully regulated on your systems. Listing 2-6 shows how this works.

SELECT e.ID,

e.SomeValue

FROM dbo.Example AS e

WHERE e.ID = 42

OPTION (MAXDOP 2);

Listing 2-6

Controlling the degree of parallelism through a query hint

If you want to more directly control which plans go parallel, the best way is controlling the process by changing the Cost Threshold for Parallelism (Listing 2-7). The default value of 5 is extremely low for most systems. Microsoft generally doesn’t agree, but I would immediately make that value higher. Again, the precise value comes down to your system, queries, and load.

USE master;

EXEC sp_configure 'show advanced option', '1';

RECONFIGURE;

EXEC sp_configure 'cost threshold for parallelism', 35;

RECONFIGURE;

Listing 2-7

Changing the Cost Threshold for Parallelism

One way to determine a good value for the cost threshold would be to look at the estimated costs of the queries on your systems. Using any of the query metrics available, and the cost estimates from execution plans, you can see how your systems are behaving and make adjustments from there.

As was stated earlier, parallel execution of queries is expensive. Parallel execution doesn’t just take up more CPU but also considerably more memory. Because of this, SQL Server will take into account the amount of available memory, prior to executing a parallel plan. SQL Server will automatically reduce the amount of parallelism, or abandon it entirely, for a given query on a system under stress.

All DML queries are executed serially. However, the SELECT in an INSERT and the WHERE clauses of UPDATE or DELETE statements can be executed in parallel. All data changes are still applied in a serial fashion, never in parallel.

At execution time, SQL Server can change the behavior of parallel execution. It then sets the number of processing threads for a given query, and that number of threads is used throughout its execution. That value can be reexamined the next time the query executes and may change.

Execution Plan Caching

When the optimizer completes its processing, the result is an execution plan. The plan is saved in the memory of SQL Server in a place referred to as the plan cache. Saving the plan into cache is another optimization that SQL Server is performing in an attempt to make your queries faster. By saving the plan, it can be reused instead of going through the optimization process again. However, plans can be removed from cache through various processes. One of these processes is referred to as “aging.”

Aging of the Execution Plan

The plan cache is part of SQL Server’s buffer cache, which also holds data pages. As new execution plans are added to the plan cache, the size of the plan cache keeps growing, affecting the retention of useful data pages in memory. To avoid this, SQL Server dynamically controls the retention of the execution plans in the plan cache, retaining the frequently used execution plans and discarding plans that are not used for a certain period of time.

SQL Server keeps track of the frequency of an execution plan’s reuse by associating an age field to it. When an execution plan is generated, the age field is populated with the cost of generating the plan. A complex query requiring extensive optimization will have an age field value higher than that for a simpler query.

At regular intervals, the current cost of all the execution plans in the plan cache is examined by SQL Server’s lazy writer process (which manages most of the background processes in SQL Server). If an execution plan is not reused for a long time, then the current cost will eventually be reduced to 0. The cheaper the execution plan was to generate, the sooner its cost will be reduced to 0. Once an execution plan’s cost reaches 0, the plan becomes a candidate for removal from memory. SQL Server removes all plans with a cost of 0 from the plan cache when memory pressure increases to such an extent that there is no longer enough free memory to serve new requests. However, if a system has enough memory and free memory pages are available to serve new requests, execution plans with a cost of 0 can remain in the plan cache for a long time so that they can be reused later, if required.

As well as changing the costs downward, execution plans can also find their costs increased to the max cost of generating the plan every time the plan is reused (or to the current cost of the plan for ad hoc plans). For example, suppose you have two execution plans with generation costs equal to 100 and 10. Their starting cost values will therefore be 100 and 10, respectively. If both execution plans are reused immediately, their age fields will be set back to that maximum cost. With these cost values, the lazy writer will bring down the cost of the second plan to 0 much earlier than that of the first one, unless the second plan is reused more often. Therefore, even if a costly plan is reused less frequently than a cheaper plan, because of the effect of the initial cost, the costly plan can remain at a nonzero cost value for a longer period of time.

Summary

While you don’t need to understand every aspect of the query optimizer in order to make your queries run faster, having a degree of understanding will help. You’ve seen the basic process that the optimizer goes through from parsing and binding through to matching Trivial Plans and finally into the full optimization process. We’ve also discussed some of how parallel plans work and the ways in which execution plans end up in the plan cache.

The next chapter is going to discuss the best ways to capture query performance metrics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Execution Plan Generation and the Query Optimizer

Create new playlist

Sign In

Sign Up

2. Execution Plan Generation and the Query Optimizer

The Query Optimization Process

Optimization Preparation

Parsing

Binding

Optimization

Simplification

Trivial Plan Match

Optimization Phases

Generating Parallel Execution Plans

Execution Plan Caching

Aging of the Execution Plan

Summary

Table of Contents for
2. Execution Plan Generation and the Query Optimizer