Query Store specific wait statistics
Extended Events for Query Store
This chapter will look at some common problems that can arise with Query Store and how you can use the tools provided in order to ensure that Query Store is working well on your systems.
Query Store Waits
By design, as we described in the first couple of chapters, Query Store is designed to be as unobtrusive as possible. In physics, the observer effect says that observation of a process or object can affect the behavior of that process or object. Similarly in SQL Server, however lightweight the data collection, the act of collecting information is adding some additional load to your SQL Server instance. The number of possible factors that can lead to issues with the Query Store is huge. It is the same list as any other process within SQL Server: the amount of memory and CPU you have available; the number, size, and volume of the transactions in your system; the number and speed of your disks and disk controllers, and, frankly, the code running in your T-SQL statements. Any or all of these and more can change the behavior of Query Store.
The first way we can understand Query Store behavior is to use the wait statistics. Generally speaking, wait statistics are always the best way to gain a general understanding of the behavior of any given system. If you know what the system is waiting on, you know where your bottlenecks are. There are two ways to look at waits in the Query Store, through the traditional dynamic management view, sy.dm_os_wait_stats, or, on Azure SQL Database, sys.dm_db_wait_stats. You can also use Extended Events wait_completed to see waits, but that’s a very granular approach and won’t generally be needed. We’ll focus only on the DMV.
When talking about querying the wait statistics within SQL Server, I recommend you start with Paul Randal’s scripts. The import of these scripts is that they eliminate the noise, the wait statistics that never mean anything. You can access those scripts here: https://bit.ly/2wsQHQE .
Wait Statistics coming from Query Store
Of course, your system will have different values in each of the columns, but you should see a similar list of waits. You would not normally look at the Query Store wait statistics in isolation. Instead, you would query the wait statistics and look for the top ones on your system. When those top waits are Query Store, you’re likely experiencing some sort of issue.
You would not simply query for the Query Store waits. Instead, as part of your normal monitoring of wait statistics on your system, you would look for the waits above because they would indicate Query Store itself, its operations, is leading to issues on your system. If you see any of the “qds_∗” waits in your top 10, you would want to understand what that wait statistics is and then drill down on why it’s causing problems for you in your system. The way we drill down when monitoring Query Store is to use Extended Events.
Extended Events and Query Store
Extended Events are the best method to do a detailed analysis of the behavior of Query Store in your systems. Extended Events are a fairly dense topic that we’re not going to try to explain in detail in this book. To get started with Extended Events, I suggest first the Microsoft documentation here: https://bit.ly/2LfWMoj . Once you’re comfortable with how Extended Events works, you can more easily explore the capabilities of monitoring Query Store using Extended Events.
List Query Data Store Extended Events
The only issue here is that not all these events are active. Many of them, maybe even most of them, are only accessible through special flags set by Microsoft. So, while you can see them, even add them to a session, no activity will ever be recorded by them.
The information in this event represents the data that the Query Store is capturing after a query completes its cleanup process. So, with this event, you could observe and monitor that part of the behavior of the Query Store. We’re going to look at a couple of different ways you can monitor Query Store behavior and some very particular events that aid in that regard.
Tracking Query Store Behaviors
Extended Event Session producing information on the standard behavior of Query Store
Clear Query Store
The first four columns are for managing the Extended Events causality tracking, showing the distinct grouping of events and the order in which they occur. We can ignore that in this instance. What we see are the clear_all setting, which is defined in the QUERY_STORE CLEAR command, which can be set to clear ALL.
Call stored procedure
I’m showing the details of the query_store_capture_policy_evaluate event. You can see that the capture_policy_result is showing the result of the evaluation, in this case, CAPTURE. You’ll also frequently see UNDECIDED for queries that have been evaluated but are not going to be captured.
You can see the current settings for Query Store, such as the plans per query limit, currently set at 200. You also get to see the plans and plan use being managed by Query Store. These numbers will obviously change over time, showing the behavior of Query Store over time.
Remove a plan from cache
You can see the database, the fact that plan was successfully removed, and the plan being removed.
All these events give you a good idea of what’s happening within Query Store, allowing you to see how Query Store behaves.
Conclusion
While you really should just see normal behavior from Query Store the vast majority of the time, it is possible for things to go wrong. Knowing how to monitor Query Store itself ensures that you have a higher degree of confidence that Query Store is supporting and helping you, not hurting you. Wait statistics are going to be the principal mechanism of assuring good Query Store behavior. You can drill down on the details of that behavior using Extended Events to see exactly how the system is working. These processes should enable you to protect your systems and better understand the behavior of Query Store.