© Tracy Boggiano and Grant Fritchey 2019
T. Boggiano, G. FritcheyQuery Store for SQL Server 2019https://doi.org/10.1007/978-1-4842-5004-4_2

2. Overview and Architecture of the Query Store

Tracy Boggiano1  and Grant Fritchey2
(1)
Cary, NC, USA
(2)
Grafton, MA, USA
 

In order for the Query Store to collect and maintain a baseline on query performance, it needs an architecture that has minimal impact on the overall performance of SQL Server. This chapter will review how Query Store operates to collect data. We’ll also cover the various data sets that make up the Query Store. This information should make it possible for you to both understand what the Query Store is doing and have faith that it’s performing these actions in the safest manner possible for your databases.

How Query Store Works

While we are going to cover the data collected in some detail later in the chapter, it makes it easier to understand how Query Store works if we first give a quick overview of the types of data being collected. There are three fundamental data sets that define the Query Store data:
  • Query and Plan Information: Data about the query itself and about the execution plans derived from that query by the query optimizer.

  • Query Runtime Information: This is the information about how fast the query ran and how many times it was called, along with other performance-related information from query execution.

  • Query Wait Statistics: This information covers the various waits that occurred on a single query during the execution on the server.

With those core data sets defined, we can now move on to determine exactly where that information comes from and how it gets collected. We will cover all three data sets in much more detail later in the chapter. We will use these data sets as our architecture for describing the ways that Query Store gathers the data because the data sets help define the approach taken by the Query Store. This is why it was important to establish what those data sets are before we talk about where they came from.

On additional detail that you need to know to understand how the Query Store works is that this is a database level setting. It’s not controlled on the server, but rather database-by-database in your system. This is an important detail because the information the Query Store captures is written to each individual database, not a central location.

Collecting Query and Plan Information

When a query gets submitted to SQL Server, it runs through a series of processes. These processes ensure that the query is properly written, that the objects in the database referenced by the query actually exist, and, most importantly, that the query is optimized to run quickly. Figure 2-1 shows an overview of this process:
../images/473933_1_En_2_Chapter/473933_1_En_2_Fig1_HTML.jpg
Figure 2-1

The basic process of generating an execution plan

We start with the T-SQL query of course. Only Data Manipulation Language (DML) queries are captured by the Query Store. These are queries that are used to SELECT, UPDATE, DELETE, or INSERT data. Queries that are used to manipulate the data structures, Data Definition Language (DDL), are not captured to the Query Store. When the query first comes to SQL Server, it goes through an internal process called the Algebrizer. This does a number of things, but the two we care about the most are that it validates that the syntax is correct in the parser and it ensures that the objects in the query are actually in the database, through object binding. The Algebrizer also gathers a bunch of information that, along with the query, it passes to the query optimizer.

The query optimizer is the process that determines an efficient way to retrieve or modify your data based on the query you’ve supplied it and the tables, indexes, statistics, and constraints in your database. The result of the query optimization process itself is an execution plan which gets written into a memory space called the plan cache.

With one exception, which we’ll cover in a minute, the Query Store does not interfere with this process in any way. What the Query Store does is capture the output of the processes that create an execution plan. Here’s a graphic showing how the Query Store works side-by-side with the query optimization process as you can see in Figure 2-2:
../images/473933_1_En_2_Chapter/473933_1_En_2_Fig2_HTML.jpg
Figure 2-2

The process of capturing the execution plan and query for the Query Store

After the query optimizer does its work and generates an execution plan, an asynchronous process captures the query and the execution plan for the query store. It outputs them to a temporary storage space in memory. On a regular basis, the query information stored in memory gets written to the database through another asynchronous process. Microsoft describes this as occurring with minimal latency. The goal is to ensure that the Query Store data collection process does not interfere with the normal processing of the Query Optimizer.

The one exception to this that I mentioned earlier is that the asynchronous process that captures queries and query plans also checks to see if there is plan forcing in place. We’ll cover that in a lot of detail in Chapter 7. Plan forcing is the one place where the Query Store does interfere with the normal optimization process. If plan forcing is in place for a query, then the plan being forced will be used instead of the plan generated by the Query Optimizer. That will also be the plan that gets written to the plan cache.

With that, the core processing of retrieving queries and query plans is complete. The information is captured by individual query, not by stored procedure or batch. If the query is part of an object, that object’s ID is captured along with the query information. We’ll cover the details of what exactly gets capture in the section titled “The Data Collected By Query Store” in this chapter.

Collecting Query Runtime Information and Wait Statistics

The collection of the runtime data is a little more straightforward than the query and query plan data collection. All the information collected about the runtime statistics and wait statistics occurs after the query and query plan are already captured. The query executes and then the Query Store process takes place as Figure 2-3 shows:
../images/473933_1_En_2_Chapter/473933_1_En_2_Fig3_HTML.jpg
Figure 2-3

The process of capturing runtime metrics for the Query Store

There is only one fundamental difference in the processing of this data collection and that of the query and query plan data. This is the timed asynchronous writes to disk. Instead of a process with minimal latency, the data is stored and aggregated in-memory for a period of time. The default is 15 minutes. You can control this and we’ll cover it in detail in Chapter 3.

The information , such as how long a query took to run and the waits experienced during the execution, is written to memory. That data is aggregated there in memory. When that information is written to disk, it gets reaggregated based on another setting, the collection interval. By default this is 60 minutes. What this aggregation gives you is the ability to compare two different time periods so that you can see if the performance of a query is changing over time. We’ll cover this in more detail in the “Data Collected” section.

General Notes on Data Collection

The Query Store works with all the other processes within SQL Server. So, for example, if a query is recompiling and it generates the same execution plan that is already in the Query Store, usage statistics are updated and nothing else happens. If, however, this recompile results in a new execution plan, then that plan will get captured to the Query Store just as the old plan did.

In the event of a cross-database query using three-part naming, the database from which the call is being made is where the query will be recorded, regardless of where the data being retrieved or modified lives. This is important so that you know where to look for Query Store information if more than one database is involved. You will not see the query in both databases.

Because all the data collection done in the Query Store gets written to disk asynchronously, it is possible to lose data because it has not yet been flushed to disk. There is a command to make this happen manually, sys.sp_query_store_flush_db. Executing that command will force anything currently in memory to be immediately written to disk. This ensures that before a controlled failover, shutdown, or any event that would result in the data being removed from memory, you can retain that data.

The Data Collected by Query Store

As we stated at the beginning of the chapter, there are essentially three sets of information that define the data collected by the Query Store: query and plan data, runtime data, and wait statistics. As you saw in the previous section, the runtime data and the wait statistics are collected the same way, so why break them apart? The reason is simple, the focus of the book is SQL Server 2019, but Query Store has been implemented since SQL Server 2016. One change between 2016 and 2017 was the addition of wait statistics in the data collection. So we’re breaking that apart just so those who will only see the runtime information still get as much out of the book as those who look at both runtime data and wait statistics. Plus, you’ll need to plan on querying the wait statistics slightly differently than you do the runtime data, but we’ll cover that in detail in Chapter 5.

One point about the system views that expose the Query Store information must be made. The information displayed in these system views combines, but does not aggregate, the information stored on disk and in memory. As you know from earlier in the chapter, data is written first to memory and then eventually written to disk. During that interval, the runtime data will be aggregated in both places, but visible as separate rows in some of the catalog views. There’s nothing you need to do to combine this data. SQL Server does it for you. However, there’s also nothing you can do to separate this data. It’s combined at the system level.

Let’s start with the data collected about the query and the query plan itself.

Query and Query Plan Information

As we’ve already stated previously, the foundation for the information gathered by Query Store is the query itself. You may pass a batch with five or ten statements, but each of those statements, assuming they’re all data manipulation statements, will be individual queries in the Query Store. The catalog views that expose the information available in Query Store are visible in Figure 2-4:
../images/473933_1_En_2_Chapter/473933_1_En_2_Fig4_HTML.png
Figure 2-4

Catalog views containing query and query plan information

There are only four catalog views that expose the information stored on the query and the query plan. At the top and center, you can see sys.query_store_query. In the diagram only a few of the 30 columns are shown as examples of what information is available. Each query has attributes that may or may not be unique to the query. These attributes are the tables to the left and right of sys.query_store_query: sys.query_store_context_settings and sys._query_store_query_text. These attributes can be created once and then reused by multiple queries. This includes the query text. If other attributes change, but the query text remains the same, then more than one query in sys.query_store_query may have the same query_text_id value (query_id is the primary key). The query plans stored in sys.query_store_plan are tied directly to the query via a foreign key on query_id. Any given query may have multiple plans. Situations arise like parameter sniffing and others that result in a given query with more than one valid execution plan. The table shown in Figure 2-4 for sys.query_store_plan also only contains a sampling of the 23 columns available.

We’ll be covering querying these tables in detail in Chapter 5. However, it’s worth pointing out that some of the information may be a little misleading. If you look at both sys.query_store_query and sys.query_store_plan as shown in Figure 2-4, both have a column labeled count_compiles. While these values are labeled the same, they actually represent the difference between the results of a compile or recompile event for a query or for a plan. Every time a query gets compiled or recompiled, that value will be updated appropriately. However, that compile event may, or may not, result in the same plan. If a different plan gets compiled, then that plan gets its own distinct count_compiles value updated, not every plan associated with the query. Details like this we’ll be addressing throughout the book.

The rows of data stored in sys.query_store_plan actually act as the drivers for the runtime data collected. Let’s now take a look at that information.

Runtime Information

The runtime data actually gets aggregated based on two different values, the plan_id as mentioned in the previous section and the runtime interval that we talked about earlier (that has a default of 60 minutes). Figure 2-5 represents the information captured:
../images/473933_1_En_2_Chapter/473933_1_En_2_Fig5_HTML.png
Figure 2-5

Catalog views containing runtime information

The model may look quite a bit more simplified, but it masks a few complexities that are worth noting. First up, the sys.query_store_runtime_stats catalog view shows only a small part of the 58 columns that make up the view. A lot of that data is laid out similar to what you see with the duration values: an average, a minimum, a maximum, and a standard deviation. That holds true for such interesting data as the CPU time, row count, memory, and more.

The next most important thing to keep in mind when dealing with the runtime data is that it is broken up into multiple aggregations as determined by the runtime interval. Since you’re unlikely to turn Query Store on at the top of the hour, the sys.query_store_runtime_stats_interval system view will have the start_time and end_time of each interval. While these will be the appropriate amount of time apart, 60 minutes by default, the start and stop times are not going to coincide with the clock.

Another thing to remember is that there are multiple time values to take into account with this information. In addition to the runtime interval, there is also the time to flush to disk, with a default of 15 minutes. That means that for any given plan_id, there may be data on disk across multiple time intervals as well as data in memory.

Further, the execution_type can affect the information collected. This column has three values:
  • 0 – Successful execution

  • 3 – User aborted execution

  • 4 – An exception or error causing aborted execution

In the case of a query being aborted for any reason, the runtime data is still collected. However, it’s aggregated into a separate value for the given plan_id and for the given runtime_stats_interval_id. Because of this and the fact that there can be more than one row of runtime data between memory and disk, when you start writing your own queries against this data, you need to aggregate based on the plan_id, the execution_type, and the runtime_stats_interval_id.

Wait Statistics

We’ve separated the wait statistics from the runtime information only because some people using this book may be on SQL Server 2016. Figure 2-6 shows the layout of the information gathered for the wait statistics:
../images/473933_1_En_2_Chapter/473933_1_En_2_Fig6_HTML.png
Figure 2-6

Catalog views contain wait statistics

Just like with the runtime information, the wait statistics are aggregated by the runtime interval. Also, like the runtime information, wait statistics are dependent on the flush to disk interval, a default of 15 minutes, so will be aggregated in two locations. Again, like the runtime information, the execution_type also affects the aggregation of the wait statistics. Finally, the wait statistics add one more wrinkle. There is a wait_category that must be taken into account when aggregating the data across time intervals and execution_types. We’ll be covering all this in more detail in Chapter 5.

You can also see that there is a similar pattern to the data with a minimum, maximum, last, average, and standard deviation being stored. However, the wait time is the only interesting value. I left off a couple of descriptive columns in Figure 2-6 to save space. The rest of the information there is the important data for wait statistics.

Information About the Query Store

There is actually one final catalog view that we should mention, although the details of what it represents will be covered in Chapter 3. That view is sys.database_query_store_options. While it does represent some of the information gathered about the Query Store, it is not the focus of this chapter.

Summary

The principal driving factor of the methodology of collecting query store data is to do so in as unobtrusive a way as possible. All the asynchronous calls between the storage of the information to memory and disk make that abundantly clear. However, once the data is stored to disk, it lives with the database and will be available, wherever that database goes, until it is removed from another process, such as automatic cleanup, or by you manually removing the information. In Chapter 3, we’ll finally get going with starting, stopping, and controlling the Query Store.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.111.70