All performance within SQL Server starts, and frequently ends, with the query optimizer. Because the optimizer is so important to query performance, we’re diving immediately into how the optimizer behaves in order to better understand why our queries behave the way they do. The optimization process results in an execution plan, a mechanism for showing the choices the optimizer made. The optimization process is expensive in terms of resources, especially CPU, and can be somewhat time-consuming, affecting the execution time of your queries. We’ll go over what the process is, how an execution plan is generated, and how that plan goes into memory in the plan cache in the hopes of reusing the plan.
The query optimization process in general
Execution plans and the plan cache
Factors affecting parallel plan generation
The Query Optimization Process
The query optimizer in SQL Server uses a cost-based analysis mechanism to arrive at the necessary choices to satisfy the queries that you’ve submitted. In order to process a query, a large number of choices must be made. Which index to use to access a table, how to join two tables together, and whether or not the data has to be ordered, all these and many others are all given costs by the optimizer. The optimizer is also aware of the structure of the database. From the data types in your columns to your primary key definitions, constraints, and foreign keys, these objects also affect the cost estimates that the optimizer uses to arrive at its decisions. Finally, your data is included in the cost through the use of statistics about the data (covered in more detail in Chapter 5).
As was mentioned in Chapter 1, the optimizer doesn’t attempt to make a perfect execution plan. Instead, the optimizer uses mathematical models, based in part on the objects and statistics in your database, to arrive at a plan that is good enough. So the cost-based analysis tries to arrive at as optimal a plan as it can while using as few resources as it can and finishing up as quickly as it can.
Before optimization starts though, there are some initial, preparatory phases that must be accomplished.
Optimization Preparation
- 1.
Parsing
- 2.
Binding
- 3.
Optimization
Let’s take a look at the two steps that take place prior to the optimization process itself.
Parsing
A query first goes through the relational engine within SQL Server. This is the first part of two processes that deal with the query. The second is the storage engine that is responsible for data access, modification, and caching. The relational engine takes care of parsing the query through a process called the algebrizer. The relational engine executes a query per the execution plan generated and requests the data from the storage engine.
A batch with a syntax error
Once the parsing process is passed, an internal-only structure called a parse tree gets created and passed on to the next step.
Binding
The parse tree is now used to identify all the objects that make up the query. This list of objects includes tables, columns, indexes, and more. This process is called binding. All the data types being processed are identified. Any kinds of aggregation through GROUP BY are identified. All these objects and processes are put together into another internal structure called a query processor tree.
The original code before binding
The exclamation point visible on the very first operator on the left, labeled “SELECT”, is an indication of a warning. Looking at the properties for that operator, you’ll find that the SalesOrderID is actually going through a data type conversion:
Type conversion in expression (CONVERT(nvarchar(23),[soh].[SalesOrderID],0)) may affect "CardinalityEstimate" in query plan choice
This example illustrates several points in addition to syntax-based optimization. First, warnings can sometimes be unclear. In this case, the warning is not actually from the SalesOrderID column as referenced in the WHERE clause of the query. Instead, it’s from a calculated column, SalesOrderNumber. That calculation is converting the SalesOrderID, an integer, into a string and adding other information to it. In the way this calculation is being done, the optimizer recognizes that were this to be used as filtering criteria in a WHERE clause, or JOIN criteria, or even a HAVING clause, it would affect the ability to use statistics and indexes to get good performance. In this case though, the query doesn’t reference the column in any of the filtering criteria, so we can safely ignore the warning. I like this example because it shows just how realistic AdventureWorks really is. A lot of this kind of issue creeps into real-world databases.
Unlike Data Manipulation Language (DML) queries that we’ve seen so far, Data Definition Language (DDL) queries, such as CREATE TABLE, do not get execution plans. There is only one way to run a CREATE VIEW query as an example. Therefore, most DDL doesn’t go through the optimization process. The exception is CREATE INDEX DDL queries. The optimizer can take advantage of existing indexes to create another index more efficiently. So while you’ll never see a CREATE PROCEDURE execution plan, you may see one when creating an index.
Once all the binding operations are complete, another internal structure is created, called the query processor tree. This is where optimization starts.
Optimization
The optimization process is a pretty amazing piece of software. Without a debugger, you can’t really see it in action. However, we can see the results of the process in the execution plans that it creates. We can also use Dynamic Management Views (DMVs) and Extended Events to observe aspects of the optimization process. In this book, we’ll see a lot of the evidence of optimization, but it’s beyond the scope of the book to examine the code with a debugger. We’ll stick to the tools immediately available to us. Let’s start talking about the optimization process.
Let’s discuss each step in order to understand how the optimizer arrives at an execution plan.
Simplification
The first step in optimization is a process called Simplification. Here the optimizer ensures that all the objects that you reference in your query are actually in use. Statistics and row counts begin to be gathered and used. This step looks at the constraints such as foreign keys. When all this is taken together, you can see the optimizer actually change what is being processed. For example, you may write a query with a four-table join. However, only two tables are referenced in the SELECT list and the WHERE clause. Using the constraints, the optimizer can tell that it doesn’t need the other two tables to satisfy the query and will therefore eliminate them from further processing, simplifying what is being queried—hence the name of this step.
Trivial Plan Match
When a query is extremely simplistic, it can match a pattern for which there are no options, but only one possible execution plan. That situation is referred to as a Trivial Plan. The simplest example is a table with no clustered index, referred to as a heap, and no other indexes. The only way this table can be accessed is through a table scan. Rather than even attempting to build an execution plan, the optimizer looks for the query pattern that matches existing Trivial Plans. When a pattern match is found, optimization, an expensive operation, is skipped entirely. Instead, the Trivial Plan is used to satisfy the query. As stated before, you’ll only see Trivial Plans in extremely simple queries. You can see that a plan is marked Trivial in the properties of the execution plan (discussed in Chapter 4).
Optimization Phases
Search 0 or Transaction: It’s called Transaction since the types of queries that can be satisfied here are generally simple, Online Transaction Processing (OLTP) style queries with few joins and no transformations needed.
Search 1 or Quick Plan: This is where more complex operations occur such as reordering joins and other transformations to arrive at a good enough plan.
Search 2 or Full Optimization: When queries are complex, they end up going through all three optimization steps to arrive at full optimization.
The optimizer can short-circuit these searches, heading straight to Quick Plan or Full Optimization, skipping Transaction, depending on the query, its complexity, and the supporting objects such as indexes and constraints. These phases can each decide on how to perform a particular JOIN operation or access the data through scans or seeks.
The principal driver for most of the decisions within the optimizer are row counts. Row counts generally come from the statistics of the columns in the filtering criteria such as ON, HAVING, and WHERE predicates. With the statistics in hand, the optimizer begins to make choices and calculations based on how much estimated CPU, memory, and I/O are necessary to satisfy the query. These estimates are added up to a total cost for the execution of a query. That cost is not a measure based on your system. Rather it’s an arbitrary calculation made by the optimizer and is purely a mathematical construct.
The optimizer will make a calculation on the number of iterations it needs to run through, attempting different plans, with different strategies. At the same time, it calculates what is likely to be the least cost plan. When it finds the plan that meets all the calculated requirements, it will stop optimization, even if a better plan could be found. This is known as Early Termination of Optimization. You’ll also see the termination of optimization when the optimizer has gone through the calculated number of iterations. At that point, it will end optimization through what is referred to as a Timeout (more on that in Chapter 4).
If a query reaches Search 2, or Full Optimization, it may also be evaluated for conversion from a serial plan to a parallel plan (more on that later in the chapter).
A slightly complex query
CompileCPU: The amount of CPU time taken to compile this plan
Estimated Number of Rows Per Execution: The row count that the optimizer arrived at for this query
Estimated Subtree Cost: The combined estimates on CPU, I/O, and memory arrived at through the optimization process
One point worth noting here, when we see the property Optimization Level and a value of “Full” here, it doesn’t necessarily mean that this plan went through Search 2. It only means that it’s not a Trivial Plan.
Querying the dm_exec_query_optimizer_info DMV
You can see that the system I’ve run this on has done very few complex optimizations. We can tell that because there have been no Search 2 operations. You can see the other operations that have been done and the estimated time needed to accomplish them.
This information is only of limited use, but it’s good to know where to go to get it in the event that you need to understand some of the processes that the optimizer is going through and the load it’s placing on your systems.
You could also run this code before and after a query in order to see the types of optimizations that occurred on that query. However, you need to do this on an isolated system or you’re likely to see more than just the optimizations for the query you’re investigating.
Generating Parallel Execution Plans
When plans become complex enough, they may benefit from parallel execution. Parallel execution is when more than one CPU is available to SQL Server, so more than one is put to work on a single query. However, parallel execution is an expensive operation, so you really want the query to benefit from it before it goes parallel. In fact, some queries can be executed in a parallel fashion, but they actually run slower because of the process.
CPUs available to SQL Server, if more than one
SQL Server edition
Available memory
Cost Threshold for Parallelism
Query being executed
Rows to be processed
Active concurrent connections
Max Degree of Parallelism
The three most important of these factors are the actual number of CPUs available to SQL Server, the Max Degree of Parallelism, and the Cost Threshold for Parallelism.
Changing the Max Degree of Parallelism
This will take effect immediately. Determining the precise setting for this value is frequently of great debate. Your systems and your workloads should drive these decisions, not some arbitrary calculation based on the number of processors on your systems. Because of this, I’m not going to make a precise determination here, other than to emphasize again, parallel execution of some queries is very desirable, so you want to leave the possibility on your systems, meaning don’t set the Max Degree of Parallelism to 1.
Controlling the degree of parallelism through a query hint
Changing the Cost Threshold for Parallelism
One way to determine a good value for the cost threshold would be to look at the estimated costs of the queries on your systems. Using any of the query metrics available, and the cost estimates from execution plans, you can see how your systems are behaving and make adjustments from there.
As was stated earlier, parallel execution of queries is expensive. Parallel execution doesn’t just take up more CPU but also considerably more memory. Because of this, SQL Server will take into account the amount of available memory, prior to executing a parallel plan. SQL Server will automatically reduce the amount of parallelism, or abandon it entirely, for a given query on a system under stress.
All DML queries are executed serially. However, the SELECT in an INSERT and the WHERE clauses of UPDATE or DELETE statements can be executed in parallel. All data changes are still applied in a serial fashion, never in parallel.
At execution time, SQL Server can change the behavior of parallel execution. It then sets the number of processing threads for a given query, and that number of threads is used throughout its execution. That value can be reexamined the next time the query executes and may change.
Execution Plan Caching
When the optimizer completes its processing, the result is an execution plan. The plan is saved in the memory of SQL Server in a place referred to as the plan cache. Saving the plan into cache is another optimization that SQL Server is performing in an attempt to make your queries faster. By saving the plan, it can be reused instead of going through the optimization process again. However, plans can be removed from cache through various processes. One of these processes is referred to as “aging.”
Aging of the Execution Plan
The plan cache is part of SQL Server’s buffer cache, which also holds data pages. As new execution plans are added to the plan cache, the size of the plan cache keeps growing, affecting the retention of useful data pages in memory. To avoid this, SQL Server dynamically controls the retention of the execution plans in the plan cache, retaining the frequently used execution plans and discarding plans that are not used for a certain period of time.
SQL Server keeps track of the frequency of an execution plan’s reuse by associating an age field to it. When an execution plan is generated, the age field is populated with the cost of generating the plan. A complex query requiring extensive optimization will have an age field value higher than that for a simpler query.
At regular intervals, the current cost of all the execution plans in the plan cache is examined by SQL Server’s lazy writer process (which manages most of the background processes in SQL Server). If an execution plan is not reused for a long time, then the current cost will eventually be reduced to 0. The cheaper the execution plan was to generate, the sooner its cost will be reduced to 0. Once an execution plan’s cost reaches 0, the plan becomes a candidate for removal from memory. SQL Server removes all plans with a cost of 0 from the plan cache when memory pressure increases to such an extent that there is no longer enough free memory to serve new requests. However, if a system has enough memory and free memory pages are available to serve new requests, execution plans with a cost of 0 can remain in the plan cache for a long time so that they can be reused later, if required.
As well as changing the costs downward, execution plans can also find their costs increased to the max cost of generating the plan every time the plan is reused (or to the current cost of the plan for ad hoc plans). For example, suppose you have two execution plans with generation costs equal to 100 and 10. Their starting cost values will therefore be 100 and 10, respectively. If both execution plans are reused immediately, their age fields will be set back to that maximum cost. With these cost values, the lazy writer will bring down the cost of the second plan to 0 much earlier than that of the first one, unless the second plan is reused more often. Therefore, even if a costly plan is reused less frequently than a cheaper plan, because of the effect of the initial cost, the costly plan can remain at a nonzero cost value for a longer period of time.
Summary
While you don’t need to understand every aspect of the query optimizer in order to make your queries run faster, having a degree of understanding will help. You’ve seen the basic process that the optimizer goes through from parsing and binding through to matching Trivial Plans and finally into the full optimization process. We’ve also discussed some of how parallel plans work and the ways in which execution plans end up in the plan cache.
The next chapter is going to discuss the best ways to capture query performance metrics.