© Davide Mauri, Silvano Coriani, Anna Hoffman, Sanjay Mishra, Jovan Popovic 2021
D. Mauri et al.Practical Azure SQL Database for Modern Developershttps://doi.org/10.1007/978-1-4842-6370-9_10

10. Monitoring and Debugging

Davide Mauri1 , Silvano Coriani2, Anna Hoffman3, Sanjay Mishra1 and Jovan Popovic4
(1)
Redmond, WA, USA
(2)
Legnano, Milano, Italy
(3)
Sherman Oaks, CA, USA
(4)
Belgrade, Serbia
 

Azure SQL and its underlying SQL Server query engine are well known in the industry for the low administrative barrier generally required for applications to get good performance and reliability on most conditions with different data sizes and shapes.

That said, it is important for application developers to understand the foundations and internals of how query processing works in Azure SQL and what tools and capabilities are available to monitor and troubleshoot performance on both development and production phases. Generally speaking, there are two major scenarios where these capabilities are important for us to master:
  1. 1.

    Understanding how the system is behaving under a given workload: this means looking at major performance and availability metrics during your load test or production periods to make sure the system is up and running and can cope with a given workload.

     
  2. 2.

    Investigate deeply into specific activities that may happen, or have happened, if results of the previous point are showing some critical behaviors.

     

The former is a continuous, long-term, monitoring scenario that is crucial to make sure your solution will “keep the lights on” and usually requires specific characteristics, like the ability to store and analyze vast amounts of data points and trigger some alerts when key metrics are crossing certain thresholds. On the Azure platform, there is a common backbone across all services to collect diagnostic information emitted by all components of a given solution, called Azure Monitor. All services can be configured to asynchronously emit logs and metrics to Azure Monitor, which can then redirect these to both near real-time (through visualizing metrics in Azure Portal or pushing data into an Azure Event Hub instance for further processing and alerting) and more long-term analytical options (like Azure Blob Storage or Log Analytics), depending on your needs.

../images/493913_1_En_10_Chapter/493913_1_En_10_Figa_HTML.jpg

You can get more details on this specific scenario by looking at official documentation at this link: https://aka.ms/asdbmto.

The other major scenario is instead related to immediate diagnostic and troubleshooting investigations that you can execute, while a specific issue is happening, thanks to the extensive instrumentation capabilities exposed by the Azure SQL engine. Every single internal subsystem of the service, from connection and session management down to query processing and storage engine, is in fact emitting a rich and detailed set of diagnostic information that we, as developer, can take advantage of to understand how a single query is executed or how subsystems like memory management or resource governance are behaving.

As developers, we don’t necessarily need to understand every single aspect of how the internals of the service are working, but it is quite important to know the basic tools and techniques available to make sure we can investigate how our queries and workloads are performing and how we can make our applications more efficient when working with Azure SQL.

Dynamic Management Views (DMVs)

Dynamic management views (and functions) are a way to execute T-SQL queries against an Azure SQL endpoint and return state information that can be used to monitor the health of an Azure SQL server or database instance, diagnose problems, and tune performance.

As mentioned, the SQL Server engine is highly instrumented and captures tons of information on how various subsystems are working, from the integration with the underlying operating system and hardware up to how queries are executed. All these details are maintained in memory structures within the database process space (we’ll talk about how things evolved recently with Query Store later in this chapter) and are front-ended by a layer of system views and functions that can be queried by regular T-SQL commands, from a connection opened by a login that has SELECT permission on the selected DMV and VIEW SERVER STATE (for Azure SQL Managed Instance) or VIEW DATABASE STATE (for Azure SQL Database) permission. We can grant access to a given instance by executing this command:
GRANT VIEW DATABASE STATE TO database_user;

Azure SQL enables a subset of dynamic management views to diagnose performance problems, which might be caused by blocked or long-running queries, resource bottlenecks, poor query plans, and so on. In an instance of SQL Server and in Azure SQL Managed Instance, dynamic management views return server state information. In Azure SQL Database, they return information regarding your current logical database only.

While hundreds of these views and functions are available, there are really three main categories we should focus on:
  • Database-related dynamic management views (prefixed by sys.dm_db_*)

  • Execution-related dynamic management views (prefixed by sys.dm_exec_*)

  • Transaction-related dynamic management views (prefixed by sys.dm_tran_*)

Let’s start from a few basic use cases that can be common while you’re investigating what’s going on with your database instance. First, you may want to understand what’s current resource utilization on a given instance where you’re running your application against. You can open your client tool of preference (e.g., Azure Data Studio or SQL Server Management Studio) pointing to your database and execute this query:
SELECT * FROM sys.dm_db_resource_stats ORDER BY end_time DESC;
The result you’ll get back will be something similar to this:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figb_HTML.jpg

Basically, this view is returning key metrics around resource consumption for your database instance for the last hour, with a granularity of 15 seconds. If any of these metrics is getting closer to 100%, you usually have the choice of scaling up your database service or compute tier or, more likely, drilling deeper into resource utilization to understand if there are ways to make your workload more efficient.

Let’s say, as an example, that you’re maxing out on CPU utilization: a good next step would be to look at what are the top CPU-consuming queries to understand if there are ways to improve them in any way.

For demonstration purposes, we will clear up the procedure cache of our test instance to not be distracted by potentially 100s of other queries that may have ran in the past and will use this script to execute a simple query:
--!!! Test purposes only, don’t do it in production!!!--
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
-- Execute a simple query
SELECT * FROM [Sales].[Orders] WHERE CustomerID=832
On the same connection, or on a different one, we can now run the following diagnostic query targeting few sys.dm_exec_* DMVs:
--- Returning TOP 25 CPU consuming queries for this database
SELECT TOP 25
     qs.query_hash,
     qs.execution_count,
     REPLACE(REPLACE(LEFT(st.[text], 512), CHAR(10),''), CHAR(13),'') AS query_text,
     qs.total_worker_time,
     qs.min_worker_time,
     qs.total_worker_time/qs.execution_count AS avg_worker_time,
     qs.max_worker_time,
     qs.min_elapsed_time,
     qs.total_elapsed_time/qs.execution_count AS avg_elapsed_time,
     qs.max_elapsed_time,
     qs.min_logical_reads,
     qs.total_logical_reads/qs.execution_count AS avg_logical_reads,
     qs.max_logical_reads,
     qs.min_logical_writes,
     qs.total_logical_writes/qs.execution_count AS avg_logical_writes,
     qs.max_logical_writes,
     CASE WHEN CONVERT(nvarchar(max), qp.query_plan) LIKE N'%<MissingIndexes>%' THEN 1 ELSE 0 END AS missing_index,
     qs.creation_time,
     qp.query_plan,
     qs.*
FROM
     sys.dm_exec_query_stats AS qs
CROSS APPLY
     sys.dm_exec_sql_text(plan_handle) AS st
CROSS APPLY
     sys.dm_exec_query_plan(plan_handle) AS qp
WHERE
     st.[dbid]=db_id() and st.[text] NOT LIKE '%sys%'
ORDER BY
     qs.total_worker_time DESC
OPTION (RECOMPILE);
What this query returns is a row for the 25 queries that are consuming more CPU time, in our case just one, with a number of attributes as you can see from the following picture:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figc_HTML.jpg

In details, we’re getting a binary value indicating all queries with the same “shape” (where there may only be a change in a parameter value or such), an indication of how many times that query has been executed followed by the query text itself. From there, a long list of very important metrics indicating total, min, average, and max usage of critical resources like CPU (worker) time, data page reads and writes, and much more. By changing the order by column, we can look at what queries are executed more often (by execution_count) or what queries are reading or writing more data pages and so on.

This example here is just scratching the surface of what we can do with DMVs, but there is much more. We can investigate and identify performance bottlenecks due to poorly designed table or index structures, monitoring database and object sizes or locking and blocking issues between concurrent users and so on. For more details on what is possible, we recommend you to take a look at Azure SQL official documentation here: https://aka.ms/asdbmwd.

Execution plans

In our previous example, one of the columns retrieved through those DMVs was an XML fragment returning the query execution plan:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figd_HTML.jpg
By clicking that column within SQL Server Management Studio, a new window gets opened showing something like this:
../images/493913_1_En_10_Chapter/493913_1_En_10_Fige_HTML.jpg

This is the graphical representation of the query execution plan used to resolve that particular query and shows all detailed steps that Azure SQL engine took to be able to produce the result returned to the client. Azure SQL has a pool of memory that is used to store both execution plans and data buffers. When any Transact-SQL statement is executed, the Database Engine first looks through the plan cache to verify that an existing execution plan for the same Transact-SQL statement exists. The Transact-SQL statement qualifies as existing if it literally matches a previously executed Transact-SQL statement with a cached plan, character per character. Azure SQL reuses any existing plan it finds, saving the overhead of recompiling the Transact-SQL statement. If no execution plan exists, Azure SQL generates a new execution plan for the query, trying to find the one with the lowest cost in terms of resource utilization, within a reasonable time interval.

Understanding how this execution plan is created and executed is critical to make sure that the workload generated by our application is optimized for our data model and indexing strategy or instead requires some optimizations.

Note

Many modern databases use the more technical term DAG – Directed Acyclic Graph – instead of the user-friendlier term Execution Plan. If you already have experience with Apache Spark, for example, a DAG and an Execution Plan are basically the same thing.

As a rule of thumb, execution plans visualized via Management Studio steps should be interpreted from right to left and from top to bottom to determine the order in which they have been executed. In our preceding example, the first step is a Seek operator using a nonclustered index created on the CustomerID column of the Orders table to find one or more rows with a value of 832. As the index that the query engine is using does not likely contain all the columns in the select list (where we mistakenly used an asterisk instead of the full column list), the next step in the execution plan is to loop on all rows retrieved by the first operator and for each of them to execute a Lookup operator that will use the clustered index key retrieved by the Seek operator to read all the other columns from clustered index (PK_Sales_Orders) created on the Orders table.

By overlying the mouse on a given operator, we can get a lot of insights on that step, like the execution mode (Row or Batch) or the underlying storage type of that object (RowStore vs. ColumnStore) and a detailed description of CPU and IO costs associated with that.
../images/493913_1_En_10_Chapter/493913_1_En_10_Figf_HTML.jpg

For example, we can see that the Key Lookup operator is accounting for the 99% of the cost of the entire query, and it is executed 127 times (like the number of rows filtered by the first Seek operator).

As developers, we can start thinking of what is needed to improve the efficiency of our query by only selecting the columns that we really need and maybe adding these columns as included to the nonclustered index used by the first operator. This would basically eliminate completely this expensive operator driving down overall query processing costs. Let’s give it a try! By precisely doing the steps we just described, the new query plan does not contain indeed the Key Lookup operator anymore and, looking at overall subtree cost, it apparently went down to 1/100th of the original:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figg_HTML.jpg
Looking at the metrics we’re extracting through DMVs, we can notice that the execution time is less than half of the original query, and logical page read has reduced from 369 down to 5, indicating less IO operations required to read from storage and also less space utilized in the buffer pools:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figh_HTML.jpg

A logical page is nothing more than the page concept we discussed in previous chapters. Every time Azure SQL needs to read some data that is needed to process the query or to return the result to the end user, it will use one I/O operation to read an entire page, 8KB. A physical read is a read operation done on a page that was not already in memory – and therefore quite expensive. A logical read is a read operation done on a page that was already in the buffer pool, the in-memory cache. If a page is not already in the buffer pool, it will be read from the disk and added to the pool. For that reason, the logical reads are always greater or equal to the number of physical reads. As a page may be read more than once, the number of logical page reads is a good indicator of how much I/O your query is doing overall. As I/O is the most expensive operation in a database, you generally want to reduce I/O as much as possible, finding the perfect balance for you between read and write performances.

Using DMVs to extract execution plans of the most expensive queries hitting our Azure SQL database is a great way to investigate the performance of our application. In client tools like SQL Server Management Studio or Azure Data Studio, you can also obtain the same information while running your queries by selecting the “Include Actual Execution Plan” toolbar button (or use Ctrl+M shortcut) in Management Studio or clicking the “Explain” button in Data Studio.

What we have described here represents the foundational approach you can take to understand how your queries are executed by the Azure SQL engine and how you can optimize your workload to reduce useless resource consumption and improve overall performance.

The way the Azure SQL engine is creating and executing query plans is a complex and fascinating subject; if you want to learn more, please read the official documentation here: https://aka.ms/qpag.

Query store

We previously mentioned that all internal data structures representing the state of our Azure SQL databases and sourcing Dynamic Management Views and Functions are kept in database process memory, so when an instance is restarted or there’s a planned or unplanned failover, all diagnostic information gets lost. Following the introduction of the Query Store feature, back in SQL Server 2016, now Azure SQL can persist most of this diagnostic information across restarts by default, as Query Store has been enabled on the entire cloud database fleet. This feature, in fact, provides you with insight on query plan choice and performance and simplifies performance troubleshooting by helping you quickly find performance differences caused by query plan changes. Query Store automatically captures a history of queries, plans, and runtime statistics and retains these for your review. It separates data by time windows so you can see database usage patterns and understand when query plan changes happened on the server. Query Store works effectively like a flight data recorder, constantly collecting compile and runtime information related to queries and plans. Query-related data is persisted in the internal tables and presented to users through a set of views.

It basically contains three main buckets:
  • Plan store, containing execution plan details

  • Runtime stats store, where execution statistics are persisted

  • Wait stats store, with historical data around wait statistics

As we discussed previously, once created, query execution plans for any specific query in Azure SQL can change over time due statistics changes, schema changes, index changes, and so on. The procedure cache stores the latest execution plan, and plans can also get evicted from cache under memory pressure. If a newly created execution plan, for whatever reason, turns out to be suboptimal, it is usually quite challenging to understand what caused that change. By keeping multiple versions of an execution plan per a given query, Query Store can help figuring out what happened, and it is also possible to enforce a policy to direct the Query Processor to use a specific execution plan. This is referred to as plan forcing, where a mechanism like the USE PLAN query hint is applied without requiring any change to query syntax in your app.

Query Store collects plans for DML Statements such as SELECT, INSERT, UPDATE, DELETE, MERGE, and BULK INSERT.

Another great source of information during performance troubleshooting sessions is the availability of statistics related to wait states of the system, basically a collection of the underlying reasons why Azure SQL is taking a given amount of time to respond to user queries. Before the Query Store, wait statistics were usually available at the database instance level, and it was not trivial to correlate them to a specific query.

Let’s look at how Query Store collects its data: query text and the initial plan are sent to the Query Store when a query gets compiled for the first time, and it is updated in case the query gets recompiled. In case a new plan is created, this is added as a new entry for the query, and previous ones are kept along with their runtime execution stats. Runtime statistics are sent to the Query Store for each query execution and are aggregated at plan level within the currently active time interval.

During the compile and check for recompile phases, Azure SQL detects if there is a plan in Query Store that should be applied for the currently running query and if there's a forced plan different than the one in cache, query gets recompiled (this is effectively the same way as if USE PLAN hint was applied to that query). This happens completely transparently to the user application. The following diagram describes the interactions between query processor and Query Store:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figi_HTML.jpg

As mentioned, Query Store is enabled by default in Azure SQL and cannot be switched off. Default configuration is optimized for continuous data collection, but you can still control some of the configuration knobs like max store size (default 100MB) or the interval length used to aggregate statistics for query executions (default 60 minutes). If you don’t have specific needs, like during short troubleshooting sessions where you want to speed up the process, we recommend leaving settings to default values for most use cases.

Query Store internal information gets surfaced through a series of views that can be used in your diagnostic queries to understand the behaviors of Azure SQL with your specific workload. The following diagram shows Query Store views and their logical relationships, with compile-time information presented as blue entities:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figj_HTML.jpg
View names are straightforward, but let’s go through some of the main ones and see how they can be useful to monitor and troubleshoot our workloads:
  • sys.query_store_query_text is reporting unique query texts executed against the database, where every statement in the batch generates a separate query text entry .

  • sys.query_context_settings presents unique combinations of plan-affecting settings under which queries are executed.

  • sys.query_store_query shows query entries that are tracked and forced separately in Query Store.

  • sys.query_store_plan returns an estimated plan for the query with the compile-time statistics. Stored plan is equivalent to one that you get by using SET SHOWPLAN_XML ON.

  • sys.query_store_runtime_stats_interval shows runtime statistics aggregated in automatically generated time windows (intervals) for every executed plan. We can control the size of the interval using INTERVAL_LENGTH_MINUTES in ALTER DATABASE SET statement.

  • sys.query_store_runtime_stats reports aggregated runtime statistics for executed plans. Captured metrics are in the form of four statistical functions: Average, Minimum, Maximum, and Standard Deviation.

By querying these views, you can quickly get detailed information on how your workload is executing; here are some examples.

This query is returning the last ten queries executed on the database:
SELECT TOP 10 qt.query_sql_text, q.query_id,
    qt.query_text_id, p.plan_id, rs.last_execution_time
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON p.plan_id = rs.plan_id
ORDER BY rs.last_execution_time DESC;
These are the queries taking more time to execute within the last hour:
SELECT TOP 10 rs.avg_duration, qt.query_sql_text, q.query_id,
    qt.query_text_id, p.plan_id, GETUTCDATE() AS CurrentUTCTime,
    rs.last_execution_time
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON p.plan_id = rs.plan_id
WHERE rs.last_execution_time > DATEADD(hour, -1, GETUTCDATE())
ORDER BY rs.avg_duration DESC;
These are the queries executing more I/O reads in the last 24 hours:
SELECT TOP 10 rs.avg_physical_io_reads, qt.query_sql_text,
    q.query_id, qt.query_text_id, p.plan_id, rs.runtime_stats_id,
    rsi.start_time, rsi.end_time, rs.avg_rowcount, rs.count_executions
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON p.plan_id = rs.plan_id
JOIN sys.query_store_runtime_stats_interval AS rsi
    ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE rsi.start_time >= DATEADD(hour, -24, GETUTCDATE())
ORDER BY rs.avg_physical_io_reads DESC;

You’re probably starting to get what possibilities this feature is opening in terms of monitoring and troubleshooting, right? Let’s now look at some more complex scenarios where using the Query Store feature can help during our performance investigations.

Query Store at work

Let’s say your application’s performance has degraded over the last week, and you want to understand if this can be related to some changes in the database.

With Query Store , you can compare query executions based on different time windows like the recent period (last 4 hours) with what was happening last week when everything was fine in terms of performance:
--- "Recent" workload - last 4 hours
DECLARE @recent_start_time datetimeoffset;
DECLARE @recent_end_time datetimeoffset;
SET @recent_start_time = DATEADD(hour, -4, SYSUTCDATETIME());
SET @recent_end_time = SYSUTCDATETIME();
--- "History" workload – last week
DECLARE @history_start_time datetimeoffset;
DECLARE @history_end_time datetimeoffset;
SET @history_start_time = DATEADD(day, -14, SYSUTCDATETIME());
SET @history_end_time = DATEADD(day, -7, SYSUTCDATETIME());
WITH
hist AS
(
    SELECT
        p.query_id query_id,
        ROUND(ROUND(CONVERT(FLOAT, SUM(rs.avg_duration * rs.count_executions)) * 0.001, 2), 2) AS total_duration,
        SUM(rs.count_executions) AS count_executions,
        COUNT(distinct p.plan_id) AS num_plans
     FROM sys.query_store_runtime_stats AS rs
        JOIN sys.query_store_plan AS p ON p.plan_id = rs.plan_id
    WHERE (rs.first_execution_time >= @history_start_time
               AND rs.last_execution_time < @history_end_time)
        OR (rs.first_execution_time <= @history_start_time
               AND rs.last_execution_time > @history_start_time)
        OR (rs.first_execution_time <= @history_end_time
               AND rs.last_execution_time > @history_end_time)
    GROUP BY p.query_id
),
recent AS
(
    SELECT
        p.query_id query_id,
        ROUND(ROUND(CONVERT(FLOAT, SUM(rs.avg_duration * rs.count_executions)) * 0.001, 2), 2) AS total_duration,
        SUM(rs.count_executions) AS count_executions,
        COUNT(distinct p.plan_id) AS num_plans
    FROM sys.query_store_runtime_stats AS rs
        JOIN sys.query_store_plan AS p ON p.plan_id = rs.plan_id
    WHERE  (rs.first_execution_time >= @recent_start_time
               AND rs.last_execution_time < @recent_end_time)
        OR (rs.first_execution_time <= @recent_start_time
               AND rs.last_execution_time > @recent_start_time)
        OR (rs.first_execution_time <= @recent_end_time
               AND rs.last_execution_time > @recent_end_time)
    GROUP BY p.query_id
)
SELECT
    results.query_id AS query_id,
    results.query_text AS query_text,
    results.additional_duration_workload AS additional_duration_workload,
    results.total_duration_recent AS total_duration_recent,
    results.total_duration_hist AS total_duration_hist,
    ISNULL(results.count_executions_recent, 0) AS count_executions_recent,
    ISNULL(results.count_executions_hist, 0) AS count_executions_hist
FROM
(
    SELECT
        hist.query_id AS query_id,
        qt.query_sql_text AS query_text,
        ROUND(CONVERT(float, recent.total_duration/
                   recent.count_executions-hist.total_duration/hist.count_executions)
               *(recent.count_executions), 2) AS additional_duration_workload,
        ROUND(recent.total_duration, 2) AS total_duration_recent,
        ROUND(hist.total_duration, 2) AS total_duration_hist,
        recent.count_executions AS count_executions_recent,
        hist.count_executions AS count_executions_hist
    FROM hist
        JOIN recent
            ON hist.query_id = recent.query_id
        JOIN sys.query_store_query AS q
            ON q.query_id = hist.query_id
        JOIN sys.query_store_query_text AS qt
            ON q.query_text_id = qt.query_text_id
) AS results
WHERE additional_duration_workload > 0
ORDER BY additional_duration_workload DESC
OPTION (MERGE JOIN);

This query is basically calculating what queries have introduced additional duration compared to the previous execution period and is returning information like recent and historical execution counts and total duration.

For queries that are showing regressions, you should check if those had different query plans recently compared to when they were executing faster. What you can do to fix the problem is trying to force a specific execution plan for that query with this procedure:
EXEC sp_query_store_force_plan @query_id = 48, @plan_id = 49;
If you want to revert back this forcing and let Azure SQL calculate the execution plan again, you can unforce the plan by calling:
EXEC sp_query_store_unforce_plan @query_id = 48, @plan_id = 49;

It’s important to notice that, for this kind of optimization, you won’t need to change your application code in any way.

If you prefer to run these investigations by using a visual tool instead of running T-SQL queries, SQL Server Management Studio offers a series of UI pages to interact with Query Store information. You can expand the Query Store node in your database to get what scenarios are supported:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figk_HTML.jpg
If you click Regressed Queries , for example, you’ll get by default the list of top 25 queries that have regressed in performance during the last hour, and you can slice and dice data on multiple dimensions like CPU time or IO and memory to find what you need. From the same UI, you can also look at available query plans for a given query and force the most optimal one by clicking the Force Plan button. It is that easy!
../images/493913_1_En_10_Chapter/493913_1_En_10_Figl_HTML.jpg

Another interesting scenario for investigating overall performance of your application is looking at Query Wait Statistics for a given database.

Clicking that option in the database tree will pop up a new window where most important wait state categories are shown in a chart ordered by total wait time, as you can see in this picture:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figm_HTML.jpg
You can then drill down into the most impacting categories and will be presented with the top queries contributing to those wait states and their related query plans, so you will be able to force those that were more optimized for your specific workload:
../images/493913_1_En_10_Chapter/493913_1_En_10_Fign_HTML.jpg

Other similar scenarios are related to Top Resource Consuming Queries or Queries With The Highest Variation, where you can follow similar paths.

Last but not least, Azure SQL also provides Query Performance Insight, part of the Azure Portal experience, which is based on Query Store data to provide intelligent query analysis tools for single and pooled databases. It helps identify the top resource-consuming and long-running queries in your workload. This helps you find the queries to optimize to improve overall workload performance and efficiently use the resource that you are paying for. While SQL Server Management Studio and Azure Data Studio can be used to get detailed resource consumption for all your queries, Query Performance Insight gives you a quick and efficient way, right from the Azure portal, to determine their impact on overall resource usage for your database.
../images/493913_1_En_10_Chapter/493913_1_En_10_Figo_HTML.jpg
You can still drill down to individual query level to get details on things like resource consumption or how many times queries have been executed in a given time windows, but you can also interact with performance recommendations provided by Database Advisor , a feature in Azure SQL that learns about your database usage and provides customized recommendations that enable to maximize performance. These recommendations are spanning between query plan forcing, index creation and deletion, and such. By clicking the Automate button in the following page, you can also automate the execution of these recommendations and let Azure SQL keep your databases always in the best state:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figp_HTML.jpg

To get more details on these interesting capabilities of Azure SQL, we recommend you to take a look at the official documentation here: https://aka.ms/daipr.

Raising and catching exceptions in SQL

Error handling in Transact-SQL is similar exception handling in traditional programming languages. You can wrap a group of Transact-SQL statements in a TRY block, and if an error occurs, control is passed to the following CATCH block where other statements will be executed. Each error in Azure SQL is associated with a given severity level, and TRY...CATCH construct catches all execution errors that have a severity higher than 10 that do not close the database connection. If no errors happened in the TRY block, control passes to the statement immediately after the associated END CATCH statement. Within the CATCH block, you can use several system functions to get details about the error that caused the CATCH block to be executed: these function names are pretty self-explanatory (ERROR_NUMBER(), ERROR_SEVERITY(), ERROR_STATE(), ERROR_PROCEDURE(), ERROR_LINE(), ERROR_MESSAGE()). They return NULL if they are called outside the scope of the CATCH block.

The following example shows a script that contains error-handling functions:
BEGIN TRANSACTION;
BEGIN TRY
    -- Generate a constraint violation error.
    DELETE FROM Production.Product
    WHERE ProductID = 980;
END TRY
BEGIN CATCH
    SELECT
        ERROR_NUMBER() AS ErrorNumber
        ,ERROR_SEVERITY() AS ErrorSeverity
        ,ERROR_STATE() AS ErrorState
        ,ERROR_PROCEDURE() AS ErrorProcedure
        ,ERROR_LINE() AS ErrorLine
        ,ERROR_MESSAGE() AS ErrorMessage;
    IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;
GO

When we trap errors within a CATCH block, these are not returned to the calling application. If instead we want to capture error details with the CATCH block, but also report back all or part of these details to the calling application, we could call RAISERROR or THROW to bubble up the exception to the caller or simply return a resultset through a SELECT statement.

We can create complex error management logics by nesting multiple TRY...CATCH constructs. When an error happens within a CATCH block, it is treated like any other error, so if the block contains a nested TRY...CATCH construct, any error in the nested TRY block will pass control to the nested CATCH block. If there is no nested TRY...CATCH construct, the error is passed back to the caller.

If our code is calling other Stored Procedures or Triggers, errors raised by those external modules can be trapped in their own code (if they contain TRY blocks) or can be trapped by TRY...CATCH constructs in the calling code. User-defined functions, though, cannot contain TRY...CATCH constructs.

There are some exceptions that we need to keep in mind when dealing with errors in our T-SQL code:
  • If errors have a severity equal or lower than 10, then these won’t be trapped by our TRY...CATCH block (as they are not considered errors, but just warnings).

  • Some errors with severity equal or higher than 20 will cause Azure SQL to stop processing other tasks on that session, so TRY...CATCH won’t equally trap these errors.

  • Compile errors, such as syntax errors, that prevent a batch from running won’t be trapped by our TRY...CATCH block, as such errors will happen at compile time and not at runtime.

  • Statement-level recompilation or object name resolution errors (e.g., trying to use a view that has been dropped) will not equally be trapped.

When errors happening in a TRY block are invalidating the state of the current transaction, then the transaction is classified as an uncommittable transaction. An uncommittable transaction can only perform read operations or a ROLLBACK TRANSACTION. We can call the XACT_STATE function to verify if the current transaction has been classified as uncommittable. If the function returns –1, that is the case. At the end of the batch, Azure SQL rolls back uncommittable transactions and will send an error message to application.

Here’s a more complex example using TRY…CATCH with XACT_STATE:
-- Check to see whether this stored procedure exists.
IF OBJECT_ID (N'usp_GetErrorInfo', N'P') IS NOT NULL
    DROP PROCEDURE usp_GetErrorInfo;
GO
-- Create procedure to retrieve error information.
CREATE PROCEDURE usp_GetErrorInfo
AS
    SELECT
         ERROR_NUMBER() AS ErrorNumber
        ,ERROR_SEVERITY() AS ErrorSeverity
        ,ERROR_STATE() AS ErrorState
        ,ERROR_LINE () AS ErrorLine
        ,ERROR_PROCEDURE() AS ErrorProcedure
        ,ERROR_MESSAGE() AS ErrorMessage;
GO
-- SET XACT_ABORT ON will cause the transaction to be uncommittable
-- when the constraint violation occurs, as it automatically rollback the transaction
SET XACT_ABORT ON;
BEGIN TRY
    BEGIN TRANSACTION;
        -- A FOREIGN KEY constraint exists on this table. This
        -- statement will generate a constraint violation error.
        DELETE FROM Production.Product
            WHERE ProductID = 980;
    -- If the DELETE statement succeeds, commit the transaction.
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    -- Execute error retrieval routine.
    EXECUTE usp_GetErrorInfo;
    -- Test XACT_STATE:
        -- If 1, the transaction is committable.
        -- If -1, the transaction is uncommittable and should
        --     be rolled back.
        -- XACT_STATE = 0 means that there is no transaction and
        --     a commit or rollback operation would generate an error.
    -- Test whether the transaction is uncommittable.
    IF (XACT_STATE()) = -1
    BEGIN
        PRINT
            N'The transaction is in an uncommittable state.' +
            'Rolling back transaction.'
        ROLLBACK TRANSACTION;
    END;
    -- Test whether the transaction is committable.
    IF (XACT_STATE()) = 1
    BEGIN
        PRINT
            N'The transaction is committable.' +
            'Committing transaction.'
        COMMIT TRANSACTION;
    END;
END CATCH;
GO

As you have seen, Azure SQL provides very complete and powerful exception handling features which are needed in any modern application, as managing exceptions is absolutely important to provide a great user experience.

Keep it simple!

Keep in mind that besides T-SQL, .NET or Python (or any of your preferred languages) code also has great exception support, and the best user experience is usually obtained when they work together as a team.

With this in mind and with the idea of keeping our solution as simple as possible, a very common pattern where your programming language of choice and T-SQL work very well together is the one that uses XACT_ABORT ON .

With XACT_ABORT ON, anything that is in a transaction must be correctly (exactly) executed, or Azure SQL will abort the transaction and terminate the current code execution.

If you plan to handle failure logic in the application, this command can really help you to have lean and clean code:
SET XACT_ABORT ON
BEGIN TRAN
INSERT INTO Orders VALUES (2,1,getdate());
UPDATE Inventory SET QuantityInStock=QuantityInStock-1
  WHERE ProductID=1
COMMIT TRAN

Thanks to the XACT_ABORT being set to on, in the preceding code, or both the INSERT and UPDATE will run without any errors so that the COMMIT will be executed, or if there is any error during execution of INSERT or UPDATE, the entire transaction will be automatically rolled back (even if there is no ROLLBACK TRAN in the code). Execution of the code will also be interrupted, and the raised error will be returned to the caller (the application code in our sample).

As you can see, you have the full spectrum of options when deciding how to deal with exceptions and errors. You can decide that it is better to handle it inside Azure SQL or you can bubble it up to the application. In both cases, you are in control so that you can implement the best option for your solution.

Integration with application insights

As application developers creating cloud-based solutions, it is quite critical to understand that to troubleshoot and debug our apps’ issues, in most cases we cannot just connect to a specific server as we would have done in a traditional on-premises context. This is especially true for applications leveraging Platform as a Service components and services where you do not even have the notion of a physical or virtual server to connect. That is why it is so important to consider proper instrumentation within our codebase to emit all the diagnostic information required to remotely drill down into our application behaviors. Building all this infrastructure ourselves can be a challenging task; that’s why several native and third-party solutions that solely focus on solving the instrumentation challenge became quite successful over the last years.

Application Insights, a feature of Azure Monitor, is an extensible Application Performance Management (APM) service for developers and DevOps professionals who can use it to monitor their live applications deployed on the Azure platform. It will help detect performance anomalies and includes powerful analytics tools to help you diagnose issues and to understand what users do with your app. It is designed to help you continuously improve performance and usability. It works for apps on a wide variety of platforms including .NET, Node.js, Java, and Python hosted on-premises, hybrid, or any public cloud. It integrates with your DevOps process and has connection points to a variety of development tools and services.

You can install a small instrumentation package (available as an SDK) in your application or enable Application Insights using the Application Insights Agent when supported like in Azure Virtual Machines. The instrumentation monitors your app and directs the telemetry data to an Azure Application Insights Resource using a unique GUID that we refer to as an Instrumentation Key. You can instrument not only a web service application or a VM but also any background components and the JavaScript in the web pages themselves. The application and its components can run anywhere – it does not necessarily have to be hosted in Azure.
../images/493913_1_En_10_Chapter/493913_1_En_10_Figq_HTML.jpg

Application Insights will let you understand all sorts of insights related to how the components of your application are performing, things like request rates, response times, and failure rates. It also captures dependencies and interactions with Azure services like Azure SQL, plus a bunch of other bonuses. You can then create live dashboards on top of collected data to be used as part of regular production-level monitoring, but also drill down into specific issues or exceptions happening or export diagnostic data to other services.

From your application, once you have obtained your Instrumentation Key from the portal for your Application Insights instance, you just have to add to your project the proper SDK version depending on your programming language and framework and add a few configuration information. Let us see an example using Java Spring Boot; we need to start by adding a Maven dependency for the SDK:
<dependency>
  <groupId>com.microsoft.azure</groupId>
  <artifactId>applicationinsights-spring-boot-starter</artifactId>
  <version>2.5.1</version>
</dependency>
The next step is configuring the application properties passing the Instrumentation Key:
# Specify the instrumentation key of your Application Insights resource.
azure.application-insights.instrumentation-key=974d297f-aaaa-aaaa-bbbb-4abcdabcd050
# Specify the name of your spring boot application. This can be any logical name you would like to give to your app.
spring.application.name=SpringBootInAzureDemo
Then we can use proper classes like TelemetryClient to keep tracking metrics like Azure SQL call response times, as shown in this simple example:
@RestController
@RequestMapping("/")
public class Controller {
  @Autowired
  UserRepo userRepo;
  @Autowired
  TelemetryClient telemetryClient;
  @GetMapping("/greetings")
  public String greetings() {
    // send event
    telemetryClient.trackEvent("URI /greeting is triggered");
    return "Hello World!";
  }
  @GetMapping("/users")
  public List<User> users() {
    try {
      List<User> users;
      // measure DB query benchmark
      long startTime = System.nanoTime();
      users = userRepo.findAll();
      long endTime = System.nanoTime();
      MetricTelemetry benchmark = new MetricTelemetry();
      benchmark.setName("DB query");
      benchmark.setValue(endTime - startTime);
      telemetryClient.trackMetric(benchmark);
      return users;
    } catch (Exception e) {
      // send exception information
      telemetryClient.trackEvent("Error");
      telemetryClient.trackTrace("Exception: " + e.getMessage());
      throw new ResponseStatusException(HttpStatus.INTERNAL_SERVER_ERROR, e.getMessage());
    }
  }
}
You can then start seeing these metrics exposed in your Application Insight instance through various monitoring pages and custom dashboards you can create. The following picture is showing the Application Map, where you can see your application and how it interacts with other services like SQL reporting number of calls and response times:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figr_HTML.jpg
For .NET/.NET Core, we also have deeper integration between Application Insights SDK and Azure SQL client libraries like Microsoft.Data.SqlClient . You can automatically track dependencies between your application and other services by initializing this class that you’ll find in the Microsoft.ApplicationInsights.DependencyCollector Nuget package:
DependencyTrackingTelemetryModule depModule = new DependencyTrackingTelemetryModule();
    depModule.Initialize(TelemetryConfiguration.Active);
Then, you can configure your ASP.NET applications to track database interactions down to the SQL commands executed by just enabling telemetry at the service level:
services.ConfigureTelemetryModule<DependencyTrackingTelemetryModule>((module, o) => { module. EnableSqlCommandTextInstrumentation = true; });
You also have to explicitly opt in to SQL command collection in the applicationInsights.config config file:
<Add Type="Microsoft.ApplicationInsights.DependencyCollector.DependencyTrackingTelemetryModule, Microsoft.AI.DependencyCollector">
<EnableSqlCommandTextInstrumentation>true</EnableSqlCommandTextInstrumentation>
</Add>
As a result of this configuration, you will be able to investigate performance of external services called by your application code and see database calls and response times prefixed by “SQL:” as in this screenshot:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figs_HTML.jpg
Drilling down into individual samples, you will be able to see the end-to-end transaction flow for that request and see the contribution of the Azure SQL database call including the text of the T-SQL query:
../images/493913_1_En_10_Chapter/493913_1_En_10_Figt_HTML.jpg

This is a very complete and useful tool that will assist you on both development and production phases for monitoring, debugging, and troubleshooting purposes.

You can get all details of what we discussed here and much more on the official Application Insights documentation here: https://aka.ms/amaio.

If you want to know more

Monitoring an application is a critical step and, just like debugging, many times overlooked. Their value became apparent when you have some issues to solve, and at that time, the more data you have, the better. Generally, the more your system is transparent, the better, as you can see through its layers down to where the problem is. Azure SQL is by far one of the most transparent databases available. DMVs provide an impressive amount of details up to the single execution thread if needed. This is a great value, as it is your insurance that if something doesn’t go as expected, you or the customer support service (CSS) from Microsoft can have all the information needed to find and fix the problem. This is an incredible value that Azure SQL provides to developers natively and with a great integration with end-to-end instrumentation tools like Application Insights. A dream for a developer! And if you want to know more, as usual, here’s a list of interesting resources:
Here are some resources to dig deep into Azure SQL DMVs:
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.180.111