The information captured and stored within Query Store begins to open up new functionality for Microsoft and you. First, for Microsoft, having the ability to identify a query that has suffered from a regression (as we describe in Chapter 7) means that they can monitor the system and, using the information in the Query Store, automate forcing a plan to fix the regression. This is the Auto Plan Correction at work. Second, for us, the addition of wait statistics within the Query Store information opens up additional troubleshooting possibilities.
The ability to automatically mark queries with regressions and the new catalog views that support this
The capacity for SQL Server to automatically force, or unforce, a plan based on its performance regressing
The 23 categories of wait statistics that have been added to the Query Store information
Automatic Plan Correction
The concepts of Automatic Plan Correction are completely built on Query Store. Without the information collected by Query Store, the ability of SQL Server and Azure SQL Database to determine that a plan has regressed would not be possible. Remember that the core of plan regression is built around the idea that the query itself has not changed. Changes in code cause changes in behavior all the time. It’s when you don’t change the code or the structure of the database and performance degrades anyway that you have a regression. Query Store makes identifying regressions easier.
With the regression identified, the basic approach of Automatic Plan Correction is simple. The behavior for this query was better under the older plan. SQL Server can, when Automatic Plan Correction is enabled, automatically force that older plan, the last known good plan. Then, the behavior of the system is observed again for a time. If forcing the last good plan didn’t work, then it can be automatically unforced. All this occurs behind the scenes with no real input from the user, the developer, or the DBA.
This behavior makes the job of tuning SQL Server easier because instead of attempting to fix simple issues that can be readily automated, you have time to address more difficult subjects that require more knowledge and understanding. Simply picking the last well-behaved plan and applying it is simple tuning, but it can solve a large number of issues easily.
Identifying Regression
We can start by understanding how SQL Server identifies a regressed query and how it communicates why it thinks that query has regressed. Luckily, all this functionality is summarized in a new dynamic management view: sys.dm_db_tuning_recommendations.
That script does a number of things. I’ve placed markers in the comments in the code so that we can refer back to each section of the code to understand what is happening. First, at number 1, we have to establish behavior for the query. It takes a number of executions to establish that a query is behaving a certain way, capturing the data in Query Store. Next, at number 2, we remove the plan from the plan cache by getting the plan_handle and using FREEPROCCACHE to remove just the one plan from cache. This means that the next execution will have to compile a new plan. Without this step, even if we executed the query with different values, the plan would remain the same until it naturally aged out of cache. By forcibly removing it from cache, we set up the next step. In step 3, we execute the query, but we use a different value which has a very different data distribution in the statistics. This results in a change to the execution plan. Finally, in step 4, we again establish a pattern of behavior with the new execution plan.
Average query CPU time changed from 0.16ms to 4909.41ms
As we explained above, changing the execution plan to scan the index instead of seeking on it degraded performance. We executed the procedure enough times that averages were established and it went from 0.16ms to 4909.41ms on average, a huge leap.
The next column lets us know when the recommendation was last updated. Because things change within the system, you can, and will, see changes made to recommendations over time.
We also have the state column. This is JSON data showing us the current status of the recommendation as follows:
{“currentValue”:“Active”, “reason”:“AutomaticTuningOptionNotEnabled”}
The recommendation is active, but it is not implemented. The reason for this is clear, we have yet to enable Automatic Tuning. Finally, all the details of the information are shown in another JSON column:
{“planForceDetails”:{“queryId”:2, “regressedPlanId”:2, “regressedPlanExecutionCount”:15, “regressedPlanErrorCount”:0, “regressedPlanCpuTimeAverage”:4.909411600000000e+006,”regressedPlanCpuTimeStddev":1.181213221539555e+007, “recommendedPlanId”: 1,“recommendedPlanExecutionCount”:30, “recommendedPlanErrorCount”: 0,“recommendedPlanCpuTimeAverage”:1.622333333333333e+002, “recommendedPlanCpuTimeStddev”:2.380063281138177e+002}, “implementationDetails”:{“method”:“TSql”, “script”:“exec sp_query_store_force_plan @query_id = 2, @plan_id = 1”}}
Details of the tuning recommendation from the JSON data
planForceDetails | |
---|---|
queryID | 2: query_id value from the Query Store |
regressedPlanID | 2: The plan_id value from the Query Store of the problem plan |
regressedPlanExecutionCount | 5: Number of times the regressed plan was used |
regressedPlanErrorCount | 0: When there is a value, errors during execution |
regressedPlanCpuTimeAverage | 4.909411600000000e+006: Average CPU of the plan |
regressedPlanCpuTimeStddev | 1.181213221539555e+006: Standard deviation of that value |
recommendedPlanID | 1: The plan_id that the tuning recommendation is suggesting |
recommendedPlanExecutionCount | 30: Number of times the recommended plan was used |
recommendedPlanErrorCount | 0: When there is a value, errors during execution |
recommendedPlanCpuTimeAverage | 1.622333333333333e+002: Average CPU of the plan |
recommendedPlanCpuTimeStddev | 2.380063281138177e+002: Standard deviation of that value |
implementationDetails | |
Method | TSql: Value will always be T-SQL until new types of recommendations are created |
script | exec sp_query_store_force_plan @query_id = 2, @plan_id = 1 |
These recommendations are just that, recommendations. While SQL Server is very good at making these based on the behavior captured through Query Store, until you enable Automatic Tuning, you can use these recommendations on your systems manually if you want. One thing you need to know about sys.dm_db_tuning_recommendations is that the information is not persisted. In a failover, reboot, or other type of outage, this data will be reset. Any plans forced through Automatic Tuning will remain forced. However, you’ll lose a history for why that plan was forced. You may want to capture this information regularly into other locations just in case.
Enabling Automatic Tuning
There are different ways to enable Automatic Tuning depending on if you’re working within SQL Server 2019 or Azure SQL Database. If you’re working with SQL Server 2017 or greater, you can use T-SQL to enable it. If you’re working with Azure SQL Database, you can use T-SQL or the Azure portal. The T-SQL for both is the same, as is all the other queries you would run to look at the information. We’ll start with Azure so you can see what that looks like.
Automatic Tuning in Azure SQL Database
Before we go on, please note that Azure is updated extremely frequently. The GUI that you see after you get this book could be different than that shown here. The processes should remain basically the same.
To complete this process, you will click the Apply button at the top of the page. It will prompt you to be sure that’s the action you want. Once you click OK, Automatic Tuning is enabled on your database. No other actions are required.
Enable Automatic Tuning with T-SQL
You can of course substitute the appropriate database name for the default value of current that I use here. This command can only be run on one database at a time. If you wish to enable automatic tuning for all databases on your instance, you either have to enable it in the model database before those other databases are created, or you need to set it to on for each database on the server.
It’s that easy. No other actions are necessary and this doesn’t require a reboot or changes to the server itself.
Automatic Tuning at Work
The CurrentState value has been changed to Verifying. It will measure performance over a number of executions, much as it did before. If the performance degrades, it will unforce the plan. Further, if there are errors such as time outs or aborted executions, the plan will also be unforced. You’ll also see the error_prone column in sys.dm_db_tuning_recommendations changed to a value of “Yes” in this event.
If you restart the server, the information in sys.dm_db_tuning_recommendations will be removed. Also, any plans that have been forced will also be removed. As soon as a query regresses again, any plan forcing will be automatically re-enabled. If this is an issue, you can always force the plan manually.
If a query is forced and then performance degrades, it will be unforced, as already noted. If that query again suffers from degraded performance, plan forcing will be removed and the query will be marked such that, at least until a server reboot when the information is removed, it will not be forced again.
Query Store Wait Statistics
The information we’ve talked about throughout the book that Query Store captures for query performance behavior changes the way lots of people do monitoring and query tuning. The addition of wait statistics for a given query adds to those changes. Now, you can get the wait statistics for a query easily. This information is aggregated using the time interval that you’re aggregating your queries with, 60 minutes by default. Further, because there are so many waits, rather than list them all individually, the wait statistics in Query Store are grouped into categories of waits. If you need individual, detailed, wait statistics on a query, you’ll need to use other mechanisms to capture the data.
Wait Statistics Categories
There’s not much to say about the categories. You need to know how the categories are broken down in order to understand what waits they represent. Other than that, there’s no additional functionality associated with them. This is purely informational so that you can correctly interpret the information when you look at the wait statistics in Query Store.
Query Store wait statistics categories and waits
Integer value | Wait category | Wait types include in the category |
---|---|---|
0 | Unknown | Unknown |
1 | CPU | SOS_SCHEDULER_YIELD |
2 | Worker thread | THREADPOOL |
3 | Lock | LCK_M_% |
4 | Latch | LATCH_% |
5 | Buffer latch | PAGELATCH_% |
6 | Buffer I/O | PAGEIOLATCH_% |
7 | Compilation* | RESOURCE_SEMAPHORE_QUERY_COMPILE |
8 | SQL CLR | CLR%, SQLCLR% |
9 | Mirroring | DBMIRROR% |
10 | Transaction | XACT%, DTC%, TRAN_MARKLATCH_%, MSQL_XACT_%, TRANSACTION_MUTEX |
11 | Idle | SLEEP_%, LAZYWRITER_SLEEP, SQLTRACE_BUFFER_FLUSH, SQLTRACE_INCREMENTAL_FLUSH_SLEEP, SQLTRACE_WAIT_ENTRIES, FT_IFTS_SCHEDULER_IDLE_WAIT, XE_DISPATCHER_WAIT, REQUEST_FOR_DEADLOCK_SEARCH, LOGMGR_QUEUE, ONDEMAND_TASK_QUEUE, CHECKPOINT_QUEUE, XE_TIMER_EVENT |
12 | Preemptive | PREEMPTIVE_% |
13 | Service broker | BROKER_% (but not BROKER_RECEIVE_WAITFOR) |
14 | Tran log I/O | LOGMGR, LOGBUFFER, LOGMGR_RESERVE_APPEND, LOGMGR_FLUSH, LOGMGR_PMM_LOG, CHKPT, WRITELOG |
15 | Network I/O | ASYNC_NETWORK_IO, NET_WAITFOR_PACKET, PROXY_NETWORK_IO, EXTERNAL_SCRIPT_NETWORK_IOF |
16 | Parallelism | CXPACKET, EXCHANGE |
17 | Memory | RESOURCE_SEMAPHORE, CMEMTHREAD, CMEMPARTITIONED, EE_PMOLOCK, MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, MEMORY_GRANT_UPDATE |
18 | User wait | WAITFOR, WAIT_FOR_RESULTS, BROKER_RECEIVE_WAITFOR |
19 | Tracing | TRACEWRITE, SQLTRACE_LOCK, SQLTRACE_FILE_BUFFER, SQLTRACE_FILE_WRITE_IO_COMPLETION, SQLTRACE_FILE_READ_IO_COMPLETION, SQLTRACE_PENDING_BUFFER_WRITERS, SQLTRACE_SHUTDOWN, QUERY_TRACEOUT, TRACE_EVTNOTIFF |
20 | Full text search | FT_RESTART_CRAWL, FULLTEXT GATHERER, MSSEARCH, FT_METADATA_MUTEX, FT_IFTSHC_MUTEX, FT_IFTSISM_MUTEX, FT_IFTS_RWLOCK, FT_COMPROWSET_RWLOCK, FT_MASTER_MERGE, FT_PROPERTYLIST_CACHE, FT_MASTER_MERGE_COORDINATOR, PWAIT_RESOURCE_SEMAPHORE_FT_PARALLEL_QUERY_SYNC |
21 | Other disk I/O | ASYNC_IO_COMPLETION, IO_COMPLETION, BACKUPIO, WRITE_COMPLETION, IO_QUEUE_LIMIT, IO_RETRY |
22 | Replication | SE_REPL_%, REPL_%, HADR_% (but not HADR_THROTTLE_LOG_RATE_GOVERNOR), PWAIT_HADR_%, REPLICA_WRITES, FCB_REPLICA_WRITE, FCB_REPLICA_READ, PWAIT_HADRSIM |
23 | Log rate governor | LOG_RATE_GOVERNOR, POOL_LOG_RATE_GOVERNOR, HADR_THROTTLE_LOG_RATE_GOVERNOR, INSTANCE_LOG_RATE_GOVERNOR |
Looking at Query Store Wait Statistics
There are two ways you can look at the wait statistics for a query within the Query Store. You can use T-SQL to query the information, or, there is a report within SQL Server Management Studio 18. We’ll start by querying the wait statistics.
Querying Wait Statistics
The waits experienced by dbo.ProductByCost
Network IO | ASYNC_NETWORK_IO, NET_WAITFOR_PACKET, PROXY_NETWORK_IO, EXTERNAL_SCRIPT_NETWORK_IOF |
CPU | SOS_SCHEDULER_YIELD |
Clearly, this makes for a quick and easy way to understand the bottleneck experience by a query. However, that will be a general set of knowledge, not detailed. Still, it makes a huge difference in our ability to easily identify problems that need our attention.
Wait Statistics Report
What is now shown in the report is a series of queries in the upper left that have experienced the wait that was selected. Then, the report functions much as other reports do. Selecting a query shows its various query plans over time on the right. Selecting any of those plans will cause the full plan to be shown in the pane at the bottom of the screen. In our instance you can see that the query with the most waits was dbo.ProductByCost. Specifically, it was the bad plan from our original example at the start of the chapter.
All this provides a way to understand not simply query performance, but the waits affecting the query as well.
Conclusion
Query Store enables a bunch of interesting scenarios and Automatic Tuning to eliminate plan regression is one of the more exciting. As with all else, you should monitor your systems to ensure that this behavior is benefiting you. However, most systems will likely benefit, thus freeing you up to do other work. That work may entail using the wait statistics that are now stored with queries to better identify the ones that need to be tuned. All this Query Store functionality is changing the way we do database monitoring and database tuning.