Chapter 19
Analyzing DAX query plans

DAX is a functional language with an advanced query engine that can use different storage engines. As is the case with many query languages, it is usually possible to get the same result using different DAX expressions, each one performing differently. Optimizing a measure or a query requires finding the most efficient way to obtain the desired result. In order to find a more efficient implementation for an expression, the first step is to identify the bottlenecks of the existing code.

This chapter describes the components of the DAX query engine in more detail, explaining how to obtain information about query plans and performance counters related to a particular DAX expression using DAX Studio. This knowledge is fundamental to optimize any DAX formula.

Capturing DAX queries

In order to analyze a query plan, it is necessary to execute a DAX query. A report in Power BI or Excel automatically generates queries that invoke measures included in the data model. Thus, optimizing a DAX measure requires analyzing and optimizing the DAX query that invokes that measure. Collecting the queries generated for a report is the first step in the DAX optimization journey. Indeed, a single slow report is likely to generate dozens of queries. The careful developer should find the slowest query out of them all, thus focusing on the biggest bottleneck first.

DAX Studio (http://daxstudio.org/) is a free open-source tool that offers several useful features to capture and analyze DAX queries. In the following example, see how DAX Studio connects to a Power BI data model to capture the queries generated by a report page.

The Power BI report shown in Figure 19-1 contains one visual that is slower to display. The table in the bottom-left corner with two columns (Product Name and Customers) requires a few seconds to be updated when the page is first opened and when the user changes the Continent slicer selection. We know this because we created the report on purpose. But how would one uncover the slowest visual in a report? DAX studio proves to be very helpful in this.

The report contains several visuals with different loading times.
Figure 19-1 A Power BI report with many visuals, one of which is slower to display.

DAX Studio can connect to a Power BI model by selecting the name of a Power BI Desktop file already opened on the same computer. This is shown in Figure 19-2.

This figure shows the Connect window allowing DAX Studio to connect to a Power BI model.
Figure 19-2 DAX Studio can connect to multiple types of Tabular models, including Power BI.

Once connected, DAX Studio can start capturing all the queries sent to the Tabular engine after the user activates the All Queries button in the Traces tab of the Home ribbon. This is visible in Figure 19-3.

This figure shows the Home ribbon, highlighting the All Queries button.
Figure 19-3 The All Queries feature captures all the queries sent to the Tabular engine.

At this point, every action in the client might produce one or more queries. For example, Power BI generates at least one DAX query for every visual in the page. Figure 19-4 shows the queries captured in the sample from Figure 19-1 when selecting the Asia continent in the Continent slicer.

This figure shows the All Queries pane.
Figure 19-4 The All Queries pane shows all the queries captured by DAX Studio.

Image Note

DAX Studio listens to all the queries sent to the Tabular server. By connecting DAX Studio to Power BI Desktop, the queries are always executed by the same user on the same database. Different Power BI files require different connections and a different window in DAX Studio. However, a connection to Analysis Services (which requires administrative rights) will show queries executed by different users and on different databases. The query type will be MDX for any queries generated by a client like Excel. The Duration column shows the execution time in milliseconds, and the Query column contains the complete text of the query executed on the server.

You can easily check that the first query has a duration of around three seconds. All the remaining queries are very fast, thus not worth any further attention. In a real-world report you likely will notice more than one slow query. DAX Studio lets you quickly discover the slowest queries, focusing the attention on those and avoiding any waste of time on measures and queries that are quick enough.

When you double-click on a line in the All Queries list, the query is copied into the editor window. For example, Figure 19-5 shows the complete text of the first query in the previous list. When you press the highlighted Format Query button on the Home tab, the query is also formatted using the DAX Formatter web service.

The figure highlights the presence of DAX Formatter on the Home tab.
Figure 19-5 The Format Query button invokes DAX Formatter to format the DAX code in the editor.

Once a slow query is identified following these steps, it can be executed in DAX Studio multiple times. One would analyze its query plan and other metrics to evaluate the bottlenecks and to try changes that could improve performance. The following sections analyze very simple queries created from scratch for educational reasons, although the end goal is to also analyze queries captured from a real workload.

Introducing DAX query plans

The DAX engine provides several details about how it executes a query in the query plan. However, “query plan” is a generic definition for a set of information including two different types of query plans (logical and physical) and a list of storage engine queries used by the physical query plan. Unless otherwise specified, the generic term “query plan” references the whole set of details available. These are introduced in this section and explained in more detail in the following part of the chapter.

In Chapter 17, “The DAX engines,” we explained that there are two layers in the DAX query engine: the formula engine (FE) and the storage engine (SE). Every query result is produced by executing the following steps:

  1. Building an Expression Tree. The engine transforms the query from a string to an expression tree, a data structure that is easier to manipulate for further optimization.

  2. Building a Logical Query Plan. The engine produces a list of the logical operations required to execute the query. This tree of logical operators resembles the original query syntax. It is easy to find a correspondence between a DAX function and a similar operation in the logical query plan.

  3. Building a Physical Query Plan. The engine transforms the logical query plan into a set of physical operations. A physical query plan is still a tree of operators, but the resulting tree can be different from the logical query plan.

  4. Executing the Physical Query Plan. The engine finally executes the physical query plan, retrieving data from the SE and computing the query calculations.

The first step is not interesting to analyze performance. Steps 2 and 3 involve the formula engine, whereas step 4 also involves the storage engine (SE). Technically, step 3 is the most important for determining how the query works, even though the physical query plan is available only after the actual execution of a query (step 4). Therefore, it is necessary to wait for the execution of a query before being able to see its physical query plan. However, during the execution of step 4, there are other interesting pieces of information (SE requests) that are easier to read compared to the physical query plan. For this reason, we will see how the analysis of a query often starts from the analysis of the SE requests generated at step 4.

Image Note

Tabular can be queried in both MDX and DAX, even though its natural language is DAX. Nevertheless, the engine does not translate MDX into DAX. MDX queries generate both a logical and a physical query plan just as DAX queries do. Keep in mind that the same query written in DAX or in MDX typically produces different query plans despite returning similar results. Here the focus is on the DAX language; however, the information provided in this chapter is useful to analyze how Tabular handles MDX queries as well.

Collecting query plans

As explained in the previous section, a DAX query generates both a logical and a physical query plan. These plans describe the operations performed by the query engine in detail. Unfortunately, the query plan is only available in textual representation, not graphical visualization. Because of the complexity and length of a typical query plan, other tools and techniques should be used to optimize a DAX expression before starting to analyze the query plan in detail. However, it is important to understand the basics of a DAX query plan in order to both understand the behavior of the engine and quickly spot potential bottlenecks in longer and more complex query plans. We will now describe in greater detail the different parts of a query plan using a simple query. As you will see, even the simplest query produces rather complex plans.

As an example, consider this query executed in DAX Studio:

EVALUATE
{ SUM ( Sales[Quantity] ) }

The result of the table constructor is a table with one row and one column (Value), filled with the sum of the Quantity column for all the rows of the Sales table, as shown in Figure 19-6.

The figure shows the result as one row, one column.
Figure 19-6 The result of a query with a simple table constructor with one row and one column.

The next sections describe the query plans generated and executed by this DAX query. Later on we will see how to obtain this information for any query. At this stage, just focus your attention on the role of the query plans, how they are structured, and the information they provide.

Introducing logical query plans

The logical query plan is a close representation of the DAX query expression tree. Figure 19-7 shows the logical query plan of the previous query.

The figure is a screenshot of the logical query plan.
Figure 19-7 The logical query plan of a simple query.

Each line is an operator, and the following lines, indented, are the parameters of the operator. By ignoring the parameters for each operator for a moment, it is possible to envision a simpler structure:

AddColumns:
        Sum_Vertipaq:
                 Scan_Vertipaq:
                 'Sales'[Quantity]:

The outermost operator is AddColumns. It creates the one-row table with the Value column containing the value returned by the DAX query. The Sum_VertiPaq operator scans the Sales table and sums the Sales[Quantity] column. The two operators included within Sum_Vertipaq are Scan_Vertipaq and a reference to the scanned column.

This query plan in plain English would be: “Create a table with a column named Value, filled with the content of a SUM operation, performed by the storage engine by scanning the Quantity column in the Sales table.”

The logical query plan shows what the DAX query engine plans to do in order to compute the results. Not surprisingly, it scans Sales summarizing Quantity using SUM. Clearly, more complex query plans will be harder to decode.

Introducing physical query plans

The physical query plan has a similar format to the logical query plan. Each line is an operator and its parameters are in subsequent lines, indented with one tab. Apart from this aesthetic similarity, the two query plans use completely different operators. Figure 19-8 shows the physical query plan generated by the previous DAX query.

This figure shows a physical query plan.
Figure 19-8 The physical query plan of a simple query.

Again, a simplified version of the query plan is possible by removing the parameters of each operator:

AddColumns:
         SingletonTable:
         SpoolLookup: LookupPhyOp
                 ProjectionSpool<ProjectFusion<Copy>>: SpoolPhyOp
                          Cache: IterPhyOp

The first operator, AddColumns, builds the result table. Its first parameter is a SingletonTable, which is an operator returning a single-row table generated by the table constructor. The second parameter, SpoolLookup, searches for a value in the datacache obtained by a query sent to the storage engine. This is the most intricate part of DAX query plans. The physical query plan shows that it uses some data that was previously spooled by other SE queries, but it does not show exactly from which one. In other words, the code of an SE query cannot be obtained by reading the DAX query plan. It is possible to retrieve the queries sent to the storage engine, but matching them with the exact point in the query plan is only possible in simple DAX queries. In more complex—yet realistic—DAX operations, this association might require a longer analysis.

Before moving forward, it is important to highlight some important information included in the query plan:

ProjectionSpool<ProjectionFusion<Copy>>: SpoolPhyOp #Records=1
        Cache: IterPhyOp #FieldCols=0 #ValueCols=1

Image Note

In former versions of the Tabular engine that did not support composite models, the ProjectionSpool and Cache operators were called AggregationSpool and VertiPaqResult, respectively. Besides some differences in operator names, the structure of the physical query plan did not change much, and the same logic described in this chapter can be applied to older Tabular engines.

The ProjectionSpool operator represents a query sent to the storage engine; the next section will describe storage engine requests. The ProjectionSpool operator iterates the result of the query, showing the total number of rows iterated in the #Records=1 parameter. The number of records also represents the number of rows returned by the nested Cache operator.

The number of records is important for two reasons:

  • It provides the size (in rows) of the datacache created by VertiPaq or DirectQuery. A large datacache consumes more memory at query time and takes more time to scan.

  • The iteration performed by ProjectionSpool in the formula engine runs in a single thread. When a query is slow and this number is large, it could indicate a bottleneck in the query execution.

Because of the importance of the number of records, DAX Studio reports it in the Records column of the query plan. We sometimes refer to the number of records as the cardinality of the operator.

Introducing storage engine queries

The previous physical query plan includes a ProjectionSpool operator that represents an internal query sent to the storage engine (SE). Because the model is in Import mode, DAX uses the VertiPaq SE, which receives queries in xmSQL. The following is the xmSQL query generated during the execution of the DAX query analyzed in the previous sections:

SET DC_KIND="AUTO";
SELECT
SUM ( 'DaxBook Sales'[Quantity] )
FROM 'DaxBook Sales';

'Estimated size ( volume, marshalling bytes ) : 1, 16'

The preceding code is a simplified version shown in DAX Studio, which removes a few internal details that are not relevant in performance analysis. The original xmSQL visible in SQL Server Profiler is the following:

SET DC_KIND="AUTO";
SELECT
SUM([DaxBook Sales (905)].[Quantity (923)]) AS [$Measure0]
FROM [DaxBook Sales (905)];

[Estimated size (volume, marshalling bytes): 1, 16]

This query aggregates all the rows of the Sales table, returning a single column with the sum of Quantity. The SE executes the entire aggregation operation, returning a small datacache (one row, one column) regardless of the size of the Sales table. The materialization required for this datacache is minimal. Moreover, the only data structures read by this query are those storing the Quantity column in the Sales table. A Sales table with hundreds of other columns would not affect the performance of this xmSQL query. The VertiPaq SE only scans columns included in the xmSQL query. If the model had been using DirectQuery, the query generated would have been a SQL query like the following one:

SELECT
SUM ( [Quantity] )
FROM Sales

Image Note

From here on out, we will not cover the details of query plans using DirectQuery. As discussed in Chapter 17, optimizing DirectQuery requires an optimization of the data source. However, changes to the DAX query can improve the SQL code sent to the DirectQuery data source, so the same techniques for analyzing a query plan described for VertiPaq can also be applied to DirectQuery, even though the assumptions on the speed of the storage engine are no longer valid for DirectQuery.

Later in the chapter we will explain why measuring the execution time of each SE query is an important part of the optimization process. Keep in mind that VertiPaq performance is related to the size of the columns involved in a query, and not only to the number of rows of the table. Different columns can have different compression rates and different sizes in memory, resulting in different scan times.

Capturing profiling information

The previous section introduced the DAX query plans. This section describes the tools to capture these events and how to measure their duration, which are the first steps in DAX optimization.

The DAX engine has grown as part of Microsoft SQL Server Analysis Services. Analysis Services provides trace events that can be captured with the SQL Server Profiler tool or by intercepting extended events (xEvents). Other products such as Power Pivot and Power BI use the same engine, although these products do not have the same tools available as for Analysis Services to capture trace or extended events. For example, Power Pivot for Excel and Power BI Desktop have diagnostic options that save trace events on a file, which can be opened later with the same SQL Server Profiler tool.

However, the events generated by the engine require some massage to be useful for performance analysis; the SQL Server Profiler is a general-purpose tool that is not designed specifically for this task. On the other hand, DAX Studio reads and interprets Analysis Services events, summarizing relevant information in an easier way. This is why we strongly suggest using DAX Studio as a primary tool to edit, test, and optimize DAX queries and expressions. A later section includes a description of SQL Server Profiler, providing more details to the readers interested in understanding the internal details. DAX Studio collects the same events as SQL Server Profiler, processing them and displaying summarized information in a very efficient way.

Using DAX Studio

As explained at the beginning of this chapter, DAX Studio can also capture DAX queries sent to the Tabular engine. Indeed, DAX Studio can execute any valid DAX query, including those captured by DAX Studio itself. The DAX query syntax is explained in Chapter 13, “Authoring queries.” DAX Studio collects trace events generated by one or more queries executed from within DAX Studio and displays the relevant information about the query plans and storage engine. DAX Studio can connect to Power BI, Analysis Services, and Power Pivot for Excel.

Before analyzing a query in DAX Studio, we must enable the Query Plan and Server Timings options in the Traces tab of the Home tab, as shown in Figure 19-9.

This figure shows the Home tab with the Traces tab.
Figure 19-9 The Query Plan and Server Timings options enable the tracing features in DAX Studio.

When the user enables these options, DAX Studio shows the Query Plan and Server Timings panes next to the Output and Results pane, which is visible by default. DAX Studio connects to the DAX engine as if it were a profiler, and it captures the trace events described in the next section. It automatically only filters the events related to the executed query, so we do not have to worry if there are other concurrent users active on the same server.

The Query Plan pane displays the two query plans generated by the query, as shown in Figure 19-10. The physical query plan is in the upper half of the pane, and the logical query plan is in the lower half. The physical query plan is usually the most important to analyze when looking for a performance bottleneck in the formula engine. For this reason, this list also provides a column containing the number of records iterated by a spool operation (which is an iteration performed by the formula engine, usually over a datacache). This way, we can easily recognize which operations iterate over a large number of records in a complex query plan. We will describe how to use this information later in Chapter 20, “Optimizing DAX.”

This figure shows the Query Plan pane.
Figure 19-10 The Query Plan pane displays the Physical Query Plan and the Logical Query Plan.

The Server Timings pane in Figure 19-11 shows information related to SE queries and how the execution time splits between FE and SE.

The figure shows the Server Timings pane.
Figure 19-11 The Server Timings pane displays a summary of timings information and the details of the storage engine queries.

Image Note

The SE query displayed in Figure 19-11 is applied to a model with 4 billion rows to show high CPU consumption. The model used for this example is not included in the companion files for the book.

The following metrics are found on the left side of the Server Timings pane:

  • Total: Elapsed time for the complete DAX query. It corresponds to the Duration of the Query End event.

  • SE CPU: Sum of the CPU Time value for all the VertiPaq scan events. It also reports the degree of parallelism of VertiPaq operations (number of cores used in parallel).

  • FE: Time elapsed in the formula engine, in milliseconds and as a percentage of the Total time.

  • SE: Time elapsed in the storage engine, in milliseconds and as a percentage of the Total time.

  • SE Queries: Number of queries sent to the storage engine.

  • SE Cache: Number of storage engine queries resolved by the storage engine cache, displayed as an absolute number and as a percentage of the SE Queries value.

The list in the center shows the SE queries executed, and the panel on the right side displays the complete code of the SE query selected in the center list. By default, the list includes only one row for each query, hiding the VertiPaq Scan Internal and other cache events that are always visible in SQL Server Profiler. We can show/hide these more detailed events by enabling the Cache, Internal, and Batch buttons of the Server Timings group on the Home tab from Figure 19-9. However, these events are usually not necessary in the performance analysis and are thus hidden by default.

A DAX performance analysis usually starts from the results displayed in the Server Timings pane. If the query spent more than 50% of the execution time in FE, then we might analyze the query plans first, looking for the most expensive operations in the FE. Otherwise, when most of the execution time is spent in SE, then we will look for the most expensive SE queries in the center list of the Server Timings pane.

Information provided in the Duration and CPU columns is helpful to identify performance bottlenecks in a query. Both values are in milliseconds. The Duration is the time elapsed between the start and the end of the request made to the SE. The CPU column shows the total amount of time consumed by one core. If the CPU number is larger than Duration, it means that more cores have been used in parallel to complete the operation.

The parallelism of an operation is obtained by dividing CPU by Duration. When this number is close to the total number of cores in the server, we cannot improve performance by increasing the parallelism. In this example, we used a system with eight cores. Thus, with a parallelism of 7.5, the query has reached the limits of the hardware. A concurrent user would not be able to get optimal performance executing a long-running query and would also slow down other users. In this condition, more cores would improve the speed of the query. In case the parallelism of a query is much smaller than the number of cores available, there would not be any benefit from providing more cores to the Tabular engine. The parallelism is computed only for SE operations because the FE runs in a single thread. Formula engine operations cannot benefit from parallel execution.

The Rows and KB columns show the estimated number of rows and size of the result (datacache) provided by each SE query. Because every datacache must be consumed by the FE in a single thread, a datacache with a large cardinality might be responsible for a slow FE operation. Moreover, the size of a datacache represents the memory cost required by the materialization of a set of data in an uncompressed format; indeed, the FE only consumes uncompressed data. The SE cost to create a large datacache is usually caused by the need to allocate and write uncompressed data in memory. Therefore, reducing the need for the materialization of a datacache is important to lower the volume of data exchanged between SE and FE, reducing memory pressure and improving both query performance and scalability.

Image Note

The Rows and KB columns show an estimated value that can sometimes be wrong. The exact number of rows returned by an SE query is available in the physical query plan. It is reported in the Records column of the ProjectionSpool event consuming a Cache element. The exact size of a datacache is not available, but it can be approximated proportionally to the ratio between Records in the query plan and the estimated Rows of the SE query.

DAX Studio allows sorting of the queries by any column, making it easy to find the most expensive queries when they are sorted by CPU, Duration, Rows, or KB, depending on the ongoing investigation. DAX Studio makes finding the bottlenecks in a DAX query more productive. It does not optimize DAX by itself, but it simplifies the optimization task. In the remaining part of the book we will use DAX Studio as a reference. However, the same information could also be obtained by using SQL Server Profiler, which would be more expensive.

Using the SQL Server Profiler

The SQL Server Profiler tool is installed as part of the SQL Server Management environment, which can be freely downloaded from https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms. SQL Server Profiler can be connected to an Analysis Services instance and collects all the events related to a DAX query execution. SQL Server Profiler can also load a file containing a trace session produced by the same SQL Server Profiler, or by other services such as Power Pivot for Excel and Power BI Desktop. This section explains how to use SQL Server Profiler in case DAX Studio cannot be used for any reason. However, you can skip this section if DAX Studio is available. We provide it as a reference because it can be interesting to understand the underlying behavior of the events involved in performance analysis.

In order to catch DAX query plans and storage engine queries, it is necessary to configure a new trace session selecting the interesting events for a DAX query. This is shown in Figure 19-12.

The figure shows the settings to use.
Figure 19-12 SQL Server Profiler settings to capture DAX query plans and SE queries.

There are five classes of events required to collect the same information used by DAX Studio:

  • Query End: Event fired at the end of a query. One might include the Query Begin event too, but we suggest only catching Query End because it contains the execution time.

  • DAX Query Plan: Event fired after the query engine has computed the query plan. It contains a textual representation of the query plan. This event class includes two different subclasses, Logical Plan and Physical Plan. For each query, the engine generates both classes: one logical query plan and one physical query plan.

  • DirectQuery End: Event fired when the DirectQuery engine answers a query. As with the Query End event, to gather timing information we suggest including the end event of the queries executed by the DirectQuery engine.

  • VertiPaq SE Query Cache Match: Event fired when a VertiPaq query is resolved by looking at the cache data. It is useful in order to see how much of your query performs real computations and how much of it just does cache lookups.

  • VertiPaq SE Query End: Event fired when the VertiPaq engine answers a query. As with the Query End event, to gather timing information, we suggest including the end event of the queries executed by the VertiPaq storage engine.

Image Tip

Once you select the events needed, it is a good idea to organize columns (clicking the Organize Columns button you see in Figure 19-12), and to save a template of the selections made, so you do not have to repeat the same selection every time you start a new session. You can save a trace template by using the File / Templates / New Template menu in SQL Server Profiler.

Image Note

In a production environment, one should filter the events of a single user session. Otherwise, all the events of different queries executed at the same time would be visible, which makes it harder to analyze events related to a single query. By running the Profiler in a development or test environment where there are no other active users, only the events related to the query executed for the performance tests would be visible without any background noise. DAX Studio automatically filters the events related to a single query analyzed, removing any background noise without requiring any further actions.

In order to see the sequence of events fired, we analyze what has happened by running the query used to generate the SE query displayed in Figure 19-11 using DAX Studio over a large table (over 4 billion rows):

EVALUATE
ROW ( "Result", SUM ( Audience[Weight] ) )

The log window of the SQL Server Profiler shows the result, visible in Figure 19-13.

The figure shows the log window of SQL Server Profiler.
Figure 19-13 Trace events captured in a SQL Server Profiler session for a simple DAX query.

Even for such a simple query, the DAX engine fires five different events:

  1. A DAX VertiPaq Logical Plan event, which is the logical query plan.

  2. An Internal VertiPaq Scan event, which corresponds to an SE query. There could be more than one internal event (subclass 10) for each VertiPaq Scan event (subclass 0).

  3. A VertiPaq Scan event, which describes a single SE query received by the FE.

  4. A DAX VertiPaq Physical Plan event, which is the physical query plan.

  5. A final Query End event, which returns the query duration of the complete DAX query. The CPU time reported by this event should be ignored. It should be close to the time spent in the FE but is not as accurate as the calculation explained later.

All the events show both CPU time and duration, expressed in milliseconds. CPU Time is the amount of CPU time consumed to answer the query, whereas Duration is the time the user has had to wait for their result. When Duration is lower than CPU Time, the operation has been executed in parallel on many cores. When Duration is greater than CPU Time, the operation had to wait for other operations (usually logged in different events) to be completed.

Image Note

The accuracy of the CPU Time and Duration columns is not very reliable for values lower than 16 milliseconds, and CPU Time can be less accurate than that in conditions of high parallelism. Moreover, these timings might depend on other operations in progress on the same server. It is a common practice to run the same test multiple times in order to create an average of the execution time of single operations, especially when one needs accurate numbers. However, if only looking for an order of magnitude, one might just ignore differences under 100 milliseconds.

Considering the sequence of events, the logical query plan precedes all the SE queries (VertiPaq scans), and only after their execution is the physical query plan raised. In other words, the physical query plan is an actual query plan and not an estimated one. Indeed, it contains the number of rows processed by any iteration in the FE, though it does not provide information about the CPU time and duration of each step in the query plan.

Logical and physical query plans do not provide any timing information, which are only available in the other events gathered by the Profiler. Information provided in the CPU Time and Duration columns is the same shown in CPU and Duration by DAX Studio for SE queries. However, the calculation of the time spent in the FE displayed in DAX Studio requires some more work using SQL Server Profiler.

The Query End event only provides the total elapsed time for a DAX query in the Duration column, summing both the FE and SE durations. The VertiPaq scan events provide the time spent in the SE. The elapsed time in FE is obtained by subtracting the duration of all the SE queries from the duration of the entire DAX query provided in the Query End event.

As shown in Figure 19-13, the Query End event had a Duration of 844 milliseconds. The time spent in the SE was 838 milliseconds. There was only one SE query, which lasted 838 milliseconds; only consider the VertiPaq Scan event, ignoring internal ones. The difference is 6 milliseconds, which is the amount of time spent in the FE. In case of multiple SE queries, their execution time must be aggregated to calculate the total amount of time spent in the SE, which must be subtracted from the total duration to get the amount of time spent in the FE.

Finally, the SQL Server Profiler can save and load a trace session. SQL Server Profiler cannot connect to Power Pivot for Excel, but it can open a trace file saved by Power Pivot for Excel or Power BI Desktop. However, Power Pivot for Excel has an Enable Power Pivot Tracing check box in the Settings dialog box that generates a TRC file; TRC is the extension for trace file. The events captured in the profiler session saved this way cannot be customized; they also usually include more event types than those required to analyze DAX query plans. DAX Studio cannot load a trace session but can connect directly to all the tools including Power Pivot for Excel without any limitation.

Reading VertiPaq storage engine queries

In the previous sections, we described some details of the physical and logical query plans. Although these plans are useful in some scenarios, the most interesting part of a query plan is the set of VertiPaq SE queries.

In this section we describe how to read the VertiPaq SE queries and understand what happens in VertiPaq to execute an xmSQL query. This information is useful to solve a bottleneck in the VertiPaq storage engine. However, reading these queries is useful to also understand what happens in the FE: If a calculation is not performed by the SE, it must be computed in the FE. Because the number of SE queries is usually smaller than the rows in the query plan, it is more productive to always start analyzing the SE queries regardless of the detected bottleneck type.

Introducing xmSQL syntax

In the previous section, we introduced a simple SE query described in a simplified xmSQL syntax, which is the same as displayed by DAX Studio:

SELECT
SUM ( Sales[Quantity] )
FROM Sales;

This syntax would be quite similar in standard ANSI SQL:

SELECT
SUM ( Quantity )
FROM Sales;

Every xmSQL query involves a GROUP BY condition, even if this is not explicitly stated as part of its syntax. For example, the following DAX query returns the list of unique values of the Color column in the Product table:

EVALUATE VALUES ( 'Product'[Color] )

It results in this xmSQL query; note that no GROUP BY appears in the query:

SELECT Product[Color]
FROM Product;

The corresponding query in ANSI SQL would have a GROUP BY condition:

SELECT Color
FROM Product
GROUP BY Color

The reason we compare the xmSQL to an ANSI SQL query with GROUP BY instead of DISTINCT—which would be possible for the previous example—is that most of the time xmSQL queries also include aggregated calculations. For example, consider the following DAX query:

EVALUATE
SUMMARIZECOLUMNS (
    Sales[Order Date],
    "Revenues", CALCULATE ( SUM ( Sales[Quantity] ) )
)

This is the corresponding xmSQL query sent to the SE:

SELECT Sales[Order Date], SUM ( Sales[Quantity] )
FROM Sales;

In ANSI SQL there would be a GROUP BY condition for the Order Date column:

SELECT [Order Date], SUM ( Quantity )
FROM Sales
GROUP BY [Order Date]

An xmSQL query never returns duplicated rows. When a DAX query runs over a table that does not have a unique key, the corresponding xmSQL query includes a special RowNumber column that keeps the rows unique. However, the RowNumber column is not accessible in DAX. For example, consider this DAX query:

EVALUATE Sales

It generates the following xmSQL code:

SELECT Sales[RowNumber], Sales[column1], Sales[column2], ... ,Sales[columnN]
FROM Sales
Aggregation functions

xmSQL includes the following aggregation operations:

  • SUM sums the values of a column.

  • MIN returns the minimum value of a column.

  • MAX returns the maximum value of a column.

  • COUNT counts the number of rows in the current GROUP BY.

  • DCOUNT counts the number of distinct values of a column.

The behavior of SUM, MIN, MAX, and DCOUNT is similar. For example, the following DAX query returns the number of unique customers for each order date:

EVALUATE
SUMMARIZECOLUMNS (
    Sales[Order Date],
    "Customers",  DISTINCTCOUNT ( Sales[CustomerKey] )
)

It generates the following xmSQL code:

SELECT Sales[Order Date], DCOUNT ( Sales[CustomerKey] )
FROM Sales;

Which corresponds to this ANSI SQL query:

SELECT [Order Date], COUNT ( DISTINCT CustomerKey )
FROM Sales
GROUP BY [Order Date]

The COUNT function does not have an argument. Indeed, it computes the number of rows for the current group. For example, consider the following DAX query that counts the number of products for each color:

EVALUATE
SUMMARIZECOLUMNS (
    'Product'[Color],
    "Products", COUNTROWS ( 'Product' )
)

This is the xmSQL code sent to the SE:

SELECT Product[Color], COUNT ( )
FROM Product;

A corresponding ANSI SQL query could be the following:

SELECT Color, COUNT ( * )
FROM Product
GROUP BY Color

Other aggregation functions in DAX do not have a corresponding xmSQL aggregation function. For example, consider the following DAX query using AVERAGE:

EVALUATE
SUMMARIZECOLUMNS (
    'Product'[Color],
    "Average Unit Price", AVERAGE ( 'Product'[Unit Price] )
)

The corresponding xmSQL code includes two aggregations: one for the numerator and one for the denominator of the division that will compute a simple average in the FE:

SELECT Product[Color], SUM ( Product[Unit Price] ), COUNT ( )
FROM Product
WHERE Product[Unit Price] IS NOT NULL;

Converting the xmSQL query in ANSI SQL, we would write:

SELECT Color, SUM ( [Unit Price] ), COUNT ( * )
FROM Product
WHERE Product[Unit Price] IS NOT NULL
GROUP BY Color
Arithmetical operations

xmSQL includes simple arithmetical operations: +, −, *, / (sum, subtraction, multiplication, division). These operations work on single rows, whereas the FE usually performs arithmetical operations between the results of aggregations. It is common to see arithmetical operations in the expression used by an aggregation function. For example, the following DAX query returns the sum of the product of Quantity by Unit Price calculated row-by-row for the Sales table:

EVALUATE
{ SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] ) }

It generates the following xmSQL code:

WITH
    $Expr0 := ( Sales[Quantity] * Sales[Unit Price] )
SELECT
SUM ( @$Expr0 )
FROM Sales;

The WITH statement introduces expressions associated with symbolic names (starting with the $Expr prefix) that are referenced later in the remaining part of the query. For example, in the previous code the $Expr0 expression corresponds to the multiplication between Quantity and Unit Price that is later evaluated for each row of the Sales table, summing the result in the aggregated value.

The previous xmSQL code corresponds to this ANSI SQL query:

SELECT SUM ( [Quantity] * [Unit Price] )
FROM Sales

xmSQL can also execute casts between data types to perform arithmetical operations. It is important to remember that these operations only happen within a row context, from the point of view of a DAX expression.

Filter operations

An xmSQL query can include filters in a WHERE condition. The performance of a filter depends on the cardinality of the conditions applied (this will be discussed in more detail later in the section “Understanding scan time”).

For example, consider the following query that returns the sum of the Quantity column for all sales with a unit price equal to 42:

EVALUATE
CALCULATETABLE (
    ROW ( "Result", SUM ( Sales[Quantity] ) ),
    Sales[Unit Price] = 42
)

The resulting xmSQL query is the following:

SELECT SUM ( Sales[Quantity] )
FROM Sales
WHERE Sales[Unit Price] = 420000;

Image Note

The reason why the value in the WHERE condition is multiplied by 10,000 is because the Unit Price column is stored as a Currency data type (also known as Fixed Decimal Number in Power BI). That number is stored as an Integer in VertiPaq, so the FE performs the conversion to a decimal number by dividing the result by 10,000. Such division is not visible, neither in the query plan nor in the xmSQL code.

The WHERE condition might include a test with more than one value. For example, consider a small variation of the previous query that sums either the quantity or the sales with a unit price equal to 16 or 42. You see this in the following DAX query:

EVALUATE
CALCULATETABLE (
    ROW ( "Result", SUM ( Sales[Quantity] ) ),
    OR ( Sales[Unit Price] = 16, Sales[Unit Price] = 42 )
)

The xmSQL uses the IN operator to include a list of values:

SELECT SUM ( Sales[Quantity] )
FROM Sales
WHERE Sales[Unit Price] IN ( 16000, 42000 );

Any filter condition in xmSQL only includes existing values of the column. For example, if a DAX condition references a value that does not exist in the column, the resulting xmSQL code will include a condition that will filter out all the rows. For example, if neither 16 nor 42 existed in the Sales table, the previous xmSQL query could be not invoked at all from the FE or would become something like:

SELECT SUM ( Sales[Quantity] )
FROM Sales
WHERE Sales[Unit Price] IN ( );

The result of such an xmSQL query will always be empty.

It is important to remember that xmSQL is a textual representation of an SE query. The actual structure is more optimized. For example, when the list of values allowed for a column is very long, the xmSQL reports a few values, highlighting the total number of values passed internally to the query. This happens quite often for time intelligence functions. For example, consider the following DAX query that returns the sum of the quantity for one year of sales:

EVALUATE
CALCULATETABLE (
    ROW ( "Result", SUM ( Sales[Quantity] ) ),
    Sales[Order Date] >= DATE ( 2006, 1, 1 ) && Sales[Order Date] <= DATE ( 2006, 12, 31 )
)

Using a recent version of the DAX engine, it generates the following xmSQL query:

SELECT SUM ( Sales[Quantity] )
FROM Sales
WHERE Sales[Order Date] >= 38718.000000
  VAND Sales[Order Date] <= 39082.000000

DAX represents date and time values as floating-point numbers. For this reason, the comparison of the Order Date column happens with two numbers corresponding to the two dates used in the filter argument of the DAX expression.

However, older versions of the DAX engine might produce the following xmSQL query instead:

SELECT SUM ( Sales[Quantity] )
FROM Sales
WHERE Sales[Order Date] IN ( 38732.000000, 38883.000000, 38846.000000, 38997.000000,
38809.000000, 38960.000000, 38789.000000, 38923.000000, 39074.000000, 38752.000000..[365
total values, not all displayed] ) ;

In this case, instead of a range condition, the xmSQL query has a bitmap index that identifies all the values included in the filter. The WHERE / IN condition represents such a bitmap index, only reporting in the xmSQL code a sample of the values followed by the total number of values in the column. In order to obtain the list of values for a range, another xmSQL query might be executed before:

SELECT Sales[Order Date]
FROM Sales
WHERE Sales[Order Date] >= 38718.000000
    VAND Sales[Order Date] <= 39082.000000

The actual xmSQL query generated in this last example might be more complex, including a callback to the FE to transform the result of the DATE function into the corresponding floating-point value. More information about these callbacks is included in the section “Understanding CallbackDataID” later in this chapter.

Join operators

The xmSQL code can execute JOIN conditions when a DAX query involves multiple tables connected by relationships in the data model. For example, consider the following DAX query returning the sum of the Quantity column in the Sales table for each Color name in the Product table:

EVALUATE
SUMMARIZECOLUMNS (
    'Product'[Color],
    "Sales",  SUM ( Sales[Quantity] )
)

If there is a one-to-many relationship between the Product and Sales tables in the data model, the corresponding xmSQL code includes a LEFT OUTER JOIN between the two tables, as shown in the following SE query:

SELECT Product[Color], SUM ( Sales[Quantity] )
FROM Sales
    LEFT OUTER JOIN Product ON Sales[ProductKey] = Product[ProductKey];

The ON condition of the JOIN automatically includes the columns that define the relationship in the data model. For each relationship involved in the query, there is one join in xmSQL.

Temporary tables and shallow relationships in batch events

VertiPaq can execute xmSQL queries whose result is kept in memory for another xmSQL query without being consumed by the FE. This improves the query performance because this temporary result is not materialized for the SE. If the temporary table is used in a different xmSQL operation, there should be a Batch operation in the VertiPaq storage engine grouping the different SE queries executed. For example, consider the following DAX query computing the average yearly income of customers that made at least one purchase in the corresponding year:

EVALUATE
CALCULATETABLE (
    SUMMARIZECOLUMNS (
        'Date'[Calendar Year],
        "Yearly Income", AVERAGE ( Customer[Yearly Income] )
    ),
    CROSSFILTER ( Sales[CustomerKey], Customer[CustomerKey], BOTH )
)

The presence of the bidirectional filter between the Sales and Customer tables activates a special behavior of the SE, which generates a query executed in different steps of a Batch statement. In DAX Studio, the Batch event is hidden by default, but it can be activated to see the Batch event after one or more Scan events. This is shown in Figure 19-14.

The figure shows SE events.
Figure 19-14 SE events captured in DAX Studio enabling the Batch filter in Server Timings.

The Batch reported at line 7 includes all the Scan events reported in lines 2, 4, and 6. The SE query of each Scan event is separated by a comma, but the Batch event could have additional statements like the one highlighted in the complete code of the Batch event that follows. The CREATE SHALLOW RELATION statement implements the behavior of the bidirectional filter at the SE level, optimizing the execution of a DAX query involving one or more bidirectional filters:

--
-- This query is also the first Scan event processed
--
DEFINE TABLE '$TTable3' :=
SELECT
    Customer[CustomerKey], Date[Calendar Year]
FROM Sales
    LEFT OUTER JOIN Customer
        ON Sales[CustomerKey]=Customer[CustomerKey]
    LEFT OUTER JOIN Date
        ON Sales[OrderDateKey]=Date[DateKey],

--
-- This directive does not generate any Scan event
--
CREATE SHALLOW RELATION '$TRelation1' MANYTOMANY
    FROM Customer[CustomerKey] TO '$TTable3'[Customer$CustomerKey],

--
-- This query is the second Scan event processed
--
DEFINE TABLE '$TTable4' :=
SELECT
    SIMPLEINDEXN ( '$TTable3'[Customer$CustomerKey] )
FROM '$TTable3',

--
-- This query is the third and last Scan event processed for this batch
--
DEFINE TABLE '$TTable1' :=
SELECT
    '$TTable3'[Date$Calendar Year],
    SUM ( '$TTable2'[$Measure0] ), SUM ( '$TTable2'[$Measure1] )
FROM '$TTable2'
    INNER JOIN '$TTable3'
        ON '$TTable2'[Customer$CustomerKey]='$TTable3'[Customer$CustomerKey]

REDUCED BY

'$TTable2' :=
SELECT
    Customer[CustomerKey],
    SUM ( Customer[Yearly Income] ),
    SUM (  ( PFDATAID ( Customer[Yearly Income] ) <> 2 )  )
FROM Customer
WHERE
Customer[CustomerKey] ININDEX '$TTable4'[$Index1];

Only the last DEFINE TABLE statement in a batch generates a result returned to the FE, corresponding to the $TTable2 query. All the previous DEFINE TABLE statements generate temporary tables used later within the same batch. It is worth noting that the last query starts from DEFINE TABLE $TTable1 and ends at the end of the batch, including the REDUCED BY clause. REDUCED BY is a syntax defining a subquery within the same SE request rather than requiring a separate SE query executed within the same batch, like $TTable3 and $TTable4 in this batch. The result of a temporary table defined within DEFINE TABLE before the last one in the batch could contain binary information that is never returned as a DAX result. For example, the SIMPLEINDEXN function generates an index structure, so that a following query can use that index to apply a filter to a column through the ININDEX operator. These temporary tables are not returned to the FE; they are only kept in the SE with an efficient structure used only to improve the internal evaluation of other SE queries.

Understanding scan time

After having described the syntax of xmSQL queries, it is time to consider the work performed by the storage engine to execute such statements.

VertiPaq performs a complete scan of each column involved in an SE query. There could be more iterations for a column, depending on the request. Because there are no indexes, the time required to complete a scan depends on the memory footprint of the column, which depends on the number of unique values in the column, on their distribution across the rows, and on the number of rows in the table. The importance of these factors depends on the aggregation function used in the xmSQL query. For example, consider a large table with four columns: Date, Time, Age, and Score. The table has 4 billion rows, so that we can observe relevant differences in execution time. We executed the following DAX queries for each column:

EVALUATE
ROW ( "Sum", SUM ( Example[<column name>] ) )

EVALUATE
ROW (
    "Distinct Count",
    CALCULATE (
        DISTINCTCOUNT ( Example[<column name>] ),
        NOT ISBLANK ( Example[<column name>] )
    )
)

Image Note

The second query includes a NOT ISBLANK condition that is required to obtain an SE query to execute the query. If the query did not have a filter, the number of distinct values in a column would have been retrieved from the metadata of the model, without actually executing any SE request.

We are not interested in the values returned by these queries. We are only interested in the time spent in the SE, which for these simple queries is always close to the entire execution time of the DAX queries. Table 19-1 shows the results where we reported, for each column:

  • Memory (MB): The memory footprint of the column for the entire table (4 billion rows).

  • Distinct Values: The number of unique values in the column, obtained by executing the DISTINCTCOUNT aggregation function in DAX.

  • SUM (ms): The execution time of the query that applies the SUM aggregation to the column.

  • DISTINCTCOUNT (ms): The execution time of the query that applies the DISTINCTCOUNT aggregation to the column.

Table 19-1 Column size, cardinality, and execution time of aggregation functions

Column

Memory (MB)

Distinct Values

SUM (ms)

DISTINCTCOUNT (ms)

Date

0.03

1,588

9

20

Age

165.26

96

146

333

Score

2,648.40

9,766,664

837

4,288

Time

6,493.57

1,439

1,330

4,102

At first sight, a few results might appear counterintuitive. Usually, the larger the number of unique values in a column, the slower the query. In this case, Date is faster than Age, which has a smaller number of unique values. Moreover, the Time column, which has a cardinality similar to Date, has a difference in performance of at least one order of magnitude compared to Date. The reasons for these differences are the different compression rates, derived by different sort orders of the columns.

The Date column always has the faster execution time. This is because the 4 billion rows have been processed as reading rows sorted by date. Even without partitioning, this created segments with one or two unique values each. Thus, all the rows in each segment had a very high compression rate, as is made clear by the memory used by the Date column.

The Age column has the second-best performance for both SUM and DISTINCTCOUNT. This column has a larger memory footprint than Date because there are different Age values for each Date, and rows are sorted by Date first.

The Score and Time columns have a slower performance. The performance of SUM depends mainly on the memory footprint, whereas DISTINCTCOUNT is also sensitive to the number of distinct values in the column. The reason for that is the different calculation algorithm used for these two aggregations.

The important concept here is that we can obtain a different performance for an SE query depending on the memory footprint of a column. We can optimize a VertiPaq SE query by reducing the memory footprint of the columns used. We can obtain that by using columns with a smaller number of unique values, or with a different sort order of the data source, or by reducing the number of rows in the table, or by applying other techniques that we will describe in the remaining part of this book.

Understanding DISTINCTCOUNT internals

The use of the DISTINCTCOUNT function in a DAX expression generates multiple VertiPaq Scan Internal events for a single VertiPaq Scan event. We can see internal events by enabling the Internal button in the Server Timings group of DAX Studio.

Consider the following DAX query:

EVALUATE
ROW (
    "Distinct Count",
    CALCULATE (
        DISTINCTCOUNT ( Example[Score] ),
        Example[Score] <> 0
    )
)

Table 19-2 shows the complete list of VertiPaq Scan events generated by the preceding query.

Table 19-2 VertiPaq Scan events for a DAX query with a DISTINCTCOUNT measure

Line

Subclass

Duration

CPU

Query

1

Internal

4,269

31,641

SELECT Example[Score] FROM Example;

2

Internal

4,269

31,641

SELECT Example[Score] FROM Example;

3

Internal

    19

31,766

SELECT COUNT( ) FROM $DCOUNT_DATACACHE;

4

Scan

4,288

31,766

SELECT DCOUNT ( Example[Score] ) FROM Example;

The last line includes the SE query requested by the FE. However, internally the query is split into two subqueries. The first result is duplicated in two identical rows (see the content of the Duration and CPU columns). The following is the xmSQL code of the first internal subquery, which retrieves the list of unique values in the Score column of the Example table:

SELECT Example[Score]
FROM Example
WHERE Example[Score] <> 0;

The result of this SE query is a list of the unique values in the Score column of the Example table. The next step is to count how many rows are in this list. In other words, counting the rows returned by the internal query provides the correct result to the original query. This particular xmSQL query just references a special table named $DCOUNT_DATACACHE, which references the previous result from an SE query:

SELECT COUNT ( )
FROM $DCOUNT_DATACACHE;

Table 19-2 also shows that the duration of the Scan event corresponds to the sum of the duration of the two internal events, although the duplicated event only counts once. Regarding the CPU Time, it is always the same in all the events of the same query. The parallelism ratio you can evaluate by dividing CPU Time by Duration is around seven, which means that up to eight threads in parallel were executed. The next section presents a deeper discussion about parallelism within an SE query.

Understanding parallelism and datacache

Every SE query described by an xmSQL statement returns a result called a datacache, which is a single uncompressed table in memory. The result of an SE query can be completely materialized in memory, or its rows can be consumed during the iteration without them persisting. Usually, we refer to a datacache when this result is materialized, which is the case most of the time in complex queries.

The execution of the SE query can be parallelized among many cores, using different execution threads. The number of threads used depends on the hardware and on the physical structure of the columns involved in the query. The VertiPaq engine assigns one thread to each segment involved in a single scan operation as described in the section, “Understanding segmentation and partitioning” in Chapter 17. When the operation runs on multiple threads, every thread creates a partial result. Only when all the threads complete their execution will VertiPaq consolidate these results into a single final datacache. The FE will then consume the datacache in a single thread. It is also for this reason that the result of an SE query requires such a consolidation. You can see the parallel processing and consolidation behavior described in a schema in Figure 19-15.

The diagram shows that each VertiPaq query generates a datacache by scanning in-memory data from column storage. After that, a consolidation task creates a single final datacache from the multiple datacaches.
Figure 19-15 The final datacache is a consolidation of different datacaches created by concurrent VertiPaq queries when the engine parallelizes execution.

A segment should not be too small because the consolidation process requires time. The efficiency of running scan operations in multiple threads should balance the overhead of the consolidation, but this is not possible if the segments are too small. As a side effect, VertiPaq operations on small tables cannot get the benefits of multiple cores: The consolidation process would be more expensive than the gain provided by the parallelization of small tables.

It is useful to remember that the SE query only provides data to the FE. In a simple scenario, we have the following steps:

  1. The SE receives an xmSQL query.

  2. The SE executes the scan operations potentially on many threads, creating one datacache per thread.

  3. The SE consolidates the different datacaches into a single, final datacache.

  4. The FE consumes the datacache in a single thread.

  5. The FE can use the same datacache in different steps of the query plan.

In the Profiler, you will always see the SE events before the query plan. The physical query plan always appears at the end of the events related to a query. The logical query plan can be preceded by a few SE queries. When this is the case, it is because the DAX engine itself sends queries to retrieve information about the size and density of columns. The DAX engine uses this information to create a better query plan. Using DAX Studio, you cannot see such a behavior because this tool shows query plans and SE queries in different parts of the user interface.

Understanding the VertiPaq cache

The DAX formula engine does not have a cache, whereas the VertiPaq storage engine has one: the VertiPaq cache. Its primary goal is to improve the performance of multiple requests of the same datacache within the same query. Its secondary goal is to improve the performance of different DAX queries requesting the same datacache. It is important to understand the goals of the VertiPaq cache in order to analyze its behavior and evaluate its efficiency.

For example, consider the following DAX query:

EVALUATE
ADDCOLUMNS (
    VALUES ( Example[Date] ),
    "A", CALCULATE ( SUM ( Example[Amt] ) ),
    "Q", CALCULATE ( SUM ( Example[Qty] ) )
)

The result of the query includes two columns, A and Q, summing the Amt and Qty columns of the Example table for each Date. We are going to run the query twice, analyzing the different execution time of the two runs. Table 19-3 shows the sequence of Scan events for the first execution, enabling both Cache and Internal events in DAX Studio.

Table 19-3 VertiPaq events for the first execution of a DAX query with two aggregations

Line

Subclass

Duration

CPU

Query

1

Internal

1,796

13,516

SELECT

Example[Date], SUM ( Example[Amt] ),

SUM ( Example[Qty] ), COUNT ( )

FROM Example;

2

Scan

1,796

13,516

SELECT

Example[Date], SUM ( Example[Amt] ),

SUM ( Example[Qty] ), COUNT ( )

FROM Example;

3

Internal

     6

    31

SELECT Example[Date], COUNT ( ) FROM Example;

4

Scan

     6

    31

SELECT Example[Date] FROM Example;

The second execution of the same query produces a different result because the second execution can benefit from the VertiPaq cache of the first run. This result is visible in Table 19-4.

Table 19-4 VertiPaq events for the second execution of a DAX query with two aggregations

Line

Subclass

Duration

CPU

Query

1

Cache

0

0

SELECT

Example[Date], SUM ( Example[Amt] ),

SUM ( Example[Qty] ), COUNT ( )

FROM Example;

2

Scan

0

0

SELECT

Example[Date], SUM ( Example[Amt] ),

SUM ( Example[Qty] ), COUNT ( )

FROM Example;

3

Cache

0

0

SELECT Example[Date], COUNT ( ) FROM Example;

4

Scan

0

0

SELECT Example[Date] FROM Example;

The duration of the second execution is zero milliseconds. The reason is that the second time the query was run, a datacache containing the required data was already available in the VertiPaq cache. Therefore, the engine did not execute any VertiPaq query; instead it simply retrieved the result from the cache.

The Cache and Internal events are disabled by default in DAX Studio, so the typical result visible when hitting the cache for SE queries is shown in Table 19-5. The only visible events are the Scan events, with a duration of 0 milliseconds.

Table 19-5 VertiPaq Scan events visible for a DAX query with two aggregations

Line

Subclass

Duration

CPU

Query

2

Scan

0

0

SELECT

Example[Date], SUM ( Example[Amt] ),

SUM ( Example[Qty] ), COUNT ( )

FROM Example;

4

Scan

0

0

SELECT Example[Date] FROM Example;

The VertiPaq engine only reuses data in cache when the cardinality is the same and the columns are a subset of a previous query. This algorithm is very simple because the lookup in the VertiPaq cache must not be an overhead of the memory scan operation that it is trying to avoid. For this reason, the VertiPaq cache only keeps in memory a limited number of datacaches. Therefore, there is no guarantee that a request will hit the cache, even when the query plan repeats the same storage query multiple times within the same DAX query. Nevertheless, in most conditions the VertiPaq cache satisfies several of the requests that occur within a short period.

Image Note

VertiPaq ignores row-level security settings. The DAX formula engine manages the role-based security and generates different VertiPaq storage engine queries depending on security settings and user credentials. For this reason, the VertiPaq cache is a global resource and shares the results between different users and sessions. The FE guarantees the correctness of the result, generating different SE queries depending on the requirements.

When analyzing performance, it is important to clear the cache before running a query. In order to find bottlenecks and areas of improvement for a query plan, it is better to observe the time required to complete a scan in memory, simulating the worst-case scenario (empty cache). Because of the reduced size of the VertiPaq cache, missing the cache is a frequent event on a busy server with many concurrent users running queries.

DAX Studio provides two techniques to clear the cache before executing a query:

  • Clicking the Clear Cache button on the Home tab to clear the cache on the DAX engine before executing a query with the Run Query button.

  • Selecting the Clear Cache then Run button on the Home tab so that the cache gets cleared before each Run execution.

The Run and Clear Cache then Run buttons are shown in Figure 19-16.

This figure is a part of the Home tab.
Figure 19-16 The Home tab in DAX Studio has several options to clear the cache of the DAX engine.

DAX Studio internally sends a clear cache command to the DAX engine using the following XMLA command, which removes the cache of results related to the specified database. This example clears the cache of the Contoso database:

<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    <Object>
        <DatabaseID>Contoso</DatabaseID>
    </Object>
</ClearCache>

Understanding CallbackDataID

The VertiPaq SE only supports a limited set of operators and functions in xmSQL. Thus, it is up to the FE to execute any operation not directly supported by the SE. However, when a complex calculation is required within a VertiPaq iterator, the SE may call the FE using a special xmSQL function called CallbackDataID.

The operators supported in xmSQL include the basic mathematical operations (sum, subtraction, multiplication, and division) but do not include mathematical functions such as square root (SQRT in DAX), or conditional logic such as the IF function. If you include an expression that is not supported by xmSQL in an iterator, then the query plan generates an xmSQL query containing a special function: CallbackDataID. During the iteration, the SE calls the FE for every row, passing the DAX expression and the values of its members as arguments.

For example, consider the sum of rounded values in this DAX query:

EVALUATE
ROW (
    "Result", SUMX ( Sales, ROUND ( Sales[Line Amount], 0 ) )
)

In this expression, the SE cannot evaluate the ROUND function. Therefore, the query plan generates the following xmSQL statement:

WITH
    $Expr0 := [CallbackDataID ( ROUND ( Sales[Line Amount]] ), 0 ) ]
              ( PFDATAID ( Sales[Line Amount] ) )
SELECT
    SUM ( @$Expr0 )
FROM Sales;

The CallbackDataID function contains the DAX expression that rounds a value to the closest integer. This expression is evaluated for the Line Amount column in the Sales table for the current row. The PFDATAID syntax is not relevant for analyzing the logic we are describing now. The SE calls the CallbackDataID function for each row of the Sales table. The result of the xmSQL query is a datacache with only one row, corresponding to the aggregated result. Even though the FE is single threaded, when the SE calls the FE through a CallbackDataID, the parallelism of the SE is not affected. Indeed, there could be multiple instances of the FE executed in parallel, one for each thread of the SE.

From a performance point of view, CallbackDataID has three other implications:

  • Expressions solved through CallbackDataID calls are more expensive than expressions solved by internal operators of the SE. There is an overhead associated with each call to CallbackDataID.

  • In a trace session, a VertiPaq SE event includes the time spent in the FE by a CallbackDataID call. Consider that optimizing an SE query that has a long execution time might require you to reduce or to remove the calls to CallbackDataID made by xmSQL queries.

  • The SE cache does not save datacaches produced by an xmSQL query containing CallbackDataID calls. Therefore, the presence of CallbackDataID in an xmSQL function should be carefully evaluated when the storage executes it in an iteration.

Image Important

The FE is single-threaded, but when the SE calls the FE through CallbackDataID, the execution of the code in the FE is parallelized through the several threads created by the SE. The parallelism provided by this technique reduces overall Duration, but CPU Time might increase because of the CallbackDataID calls overhead.

In order to understand the performance impact of CallbackDataID, consider the following DAX query that sums the result of a division made row by row:

EVALUATE
{
    SUMX (
        Example,
        IF (
            Example[Denominator] <> 0,
            Example[Numerator] / Example[Denominator]
        )
    )
}

The IF function avoids a calculation error in case one row contains a zero value in the denominator column. The xmSQL query sent to the SE is similar to the following one:

WITH
    $Expr0 := [CallbackDataID (
        IF (
            Example[Denominator] <> 0,
            Example[Numerator] / Example[Denominator]
        ) ]
        ( PFDATAID ( Example[Numerator] ), PFDATAID ( Example[Denominator] ) )
SELECT
    SUM ( @$Expr0 )
FROM Example;

We executed a corresponding DAX query on our Example table with 4 billion rows, obtaining the SE events shown in Table 19-6.

Table 19-6 VertiPaq Scan events with a CallbackDataID including an IF function in DAX

Line

Subclass

Duration

CPU

Rows

Query

1

Internal

8,379

64,234

1

WITH $Expr0 := [CallbackDataID ( IF ( Example[Denominator] <> 0, ...

2

Scan

8,379

64,234

1

WITH $Expr0 := [CallbackDataID ( IF ( Example[Denominator] <> 0, ...

The parallelism ratio (CPU Time divided by Duration) is close to eight because we used a server with eight cores. The important point is that different threads executed parallel calls to the FE. In previous chapters, we have seen that in DAX the DIVIDE function can replace the specific IF condition used to check whether the denominator of a division is equal to zero. We can see what happens if we use DIVIDE instead of IF in this example. The DAX query is the following:

EVALUATE
{
    SUMX (
        Example,
        DIVIDE ( Example[Numerator], Example[Denominator] )
    )
}

The DIVIDE function does not have a corresponding syntax in xmSQL, so we have a CallbackDataID in the corresponding xmSQL query sent to the engines in this case too:

WITH
    $Expr0 := [CallbackDataID (
        DIVIDE ( Example[Numerator], Example[Denominator] ) ]
        ( PFDATAID ( Example[Numerator] ), PFDATAID ( Example[Denominator] ) )
SELECT
    SUM ( @$Expr0 )
FROM Example;

Table 19-7 shows the SE events obtained from running the query over the same 4-billion-row table used in the previous example.

Table 19-7 VertiPaq Scan events with a CallbackDataID including a DIVIDE function in DAX

Line

Subclass

Duration

CPU

Rows

Query

1

Internal

6,790

51,984

1

WITH $Expr0 := [CallbackDataID ( IF ( Example[Denominator] <> 0, ...

2

Scan

6,790

51,984

1

WITH $Expr0 := [CallbackDataID ( IF ( Example[Denominator] <> 0, ...

Using DIVIDE instead of IF, we obtained a 19% performance improvement in both Duration and CPU Time. However, despite the parallelism achieved with this technique, the overhead of CallbackDataID is still high because the SE calls a function in the FE. If we remove the CallbackDataID completely, this overhead disappears. In this case, this is possible by simply applying a filter so that the iteration ignores rows containing zero in the Denominator column. This is possible with the following DAX query:

EVALUATE
{
    CALCULATE (
        SUMX (
            Example,
            Example[Numerator] / Example[Denominator]
        ),
        Example[Denominator] <> 0
    )
}

The corresponding syntax in xmSQL for this entire DAX expression does not use CallbackDataID:

WITH
    $Expr0 := Example[Numerator] / Example[Denominator]
SELECT
    SUM ( @$Expr0 )
FROM Example
WHERE Example[Denominator] <> 0;

The resulting SE events shown in Table 19-8 demonstrate an improvement of more than 50% compared to the performance of the DIVIDE version.

Table 19-8 VertiPaq Scan events without CallbackDataID to execute a safe division in DAX

Line

Subclass

Duration

CPU

Rows

Query

1

Internal

3,108

23,859

1

WITH $Expr0 := Example[Numerator] / Example[Denominator], ...

2

Scan

3,108

23,859

1

WITH $Expr0 := Example[Numerator] / Example[Denominator], ...

This last version also offers another advantage by avoiding the use of CallbackDataID. The VertiPaq cache now keeps the datacache for future executions, which is not possible when the xmSQL query includes a CallbackDataID. If we execute the last DAX query twice, the second execution produces the events shown in Table 19-9.

Table 19-9 VertiPaq Scan events without CallbackDataID hitting the SE cache

Line

Subclass

Duration

CPU

Rows

Query

1

Cache

0

0

1

WITH $Expr0 := Example[Numerator] / Example[Denominator], ...

2

Scan

0

0

1

WITH $Expr0 := Example[Numerator] / Example[Denominator], ...

In general, a careful developer should avoid or at least reduce to a minimum the number of calls to CallbackDataID made by the SE. We will show some examples of this optimization in Chapter 20.

Reading DirectQuery storage engine queries

This section describes how to read the DirectQuery SE queries. These queries are expressed in the SQL language accepted by the data source. It is advisable to read the previous section about VertiPaq storage engine queries before reading this section to understand the similarities and differences between the two.

For example, consider the following DAX query:

EVALUATE
SUMMARIZECOLUMNS (
    Sales[Order Date],
    "Total Quantity", SUM ( Sales[Quantity] )
)

When executed in a DirectQuery model, the DAX engine generates a single SE query sent to the data source in SQL language, like the following one:

SELECT
    TOP (1000001) [t4].[Order Date],
    SUM ( CAST ( [t4].[Quantity] as BIGINT ) ) AS [a0]
FROM  (
    select [StoreKey],
           [ProductKey],
           ... // other columns of the tables omitted here
    from [dbo].[Sales] as [$Table]
) AS [t4]
GROUP BY [t4].[Order Date]

The presence of a TOP condition limits the number of rows transferred from the data source to the DAX engine. If the number of rows returned is identical to the parameter of the TOP condition, then the DAX query fails because it is not able to retrieve the full set of data from the data source. For this reason the argument of TOPN is 1,000,001 when the limit of rows accepted by DirectQuery is 1,000,000. This limit avoids the consumption of too much memory because the entire result of the SE query should be loaded in memory in an uncompressed way after being transferred from the data source to the DAX engine.

Image Note

The limit or rows accepted in a storage engine request using DirectQuery is 1,000,000 by default. This number can be modified in the MaxIntermediateRowsetSize configuration setting available in Analysis Services but not in Power BI. More details about this behavior are available in the article at https://www.sqlbi.com/articles/tuning-query-limits-fordirectquery/.

Figure 19-17 shows an example of the information retrieved for SQL SE queries sent to a DirectQuery data source. The Duration column shows the time in milliseconds spent waiting for the data source to provide the result of the SQL query. The CPU is usually a low number, if not 0, because it should report the cost to the DirectQuery engine to retrieve the result, but it ignores the effective cost on the data source. In order to evaluate the actual CPU consumption on the data source, it is necessary to analyze the query running on the data source engine—for example, by using SQL Server Profiler for Microsoft SQL Server databases.

This figure is an example of the information retrieved.
Figure 19-17 DirectQuery SE queries are displayed as SQL queries.

The SQL event in Figure 19-17 does not have any information for the columns Rows and KB; indeed, the SQL events do not have an estimate of the result in terms of rows and memory as it happens for xmSQL queries sent to VertiPaq.

Finally, the result of a DirectQuery SE query is never persisted in the storage engine cache, therefore, the SE Cache counter is always zero for a DirectQuery data model.

Analyzing composite models

In a composite model the same DAX query can generate a mix of VertiPaq and DirectQuery SE queries. For example, consider the following DAX query executed in a model where the Sales table has a DirectQuery storage mode and all the other tables have a Dual storage mode:

EVALUATE
ADDCOLUMNS (
    VALUES ( 'Date'[Calendar Year] ),
    "Quantity", CALCULATE ( SUM ( Sales[Quantity] ) )
)

The ADDCOLUMNS function usually generates at least two SE queries: one for the VALUES function and the other to compute the sum of sales quantity by calendar year. The screenshot in Figure 19-18 shows two storage queries of different types, indeed.

The figure shows two storage queries of different types.
Figure 19-18 DirectQuery SE are displayed as SQL queries.

The sum of quantity by calendar year requires that a SQL query (displayed at line 1) be sent to the DirectQuery data source. The list of Calendar Year names requested by VALUES is provided by the xmSQL VertiPaq SE query at line 3.

When analyzing a composite model, pay attention to the Subclass column that identifies the type of SE used. SQL always corresponds to a DirectQuery data source, which is usually slower than VertiPaq and can be optimized by using aggregations. This is described in the next section.

Using aggregations in the data model

As described in Chapter 18, “Optimizing VertiPaq,” the presence of aggregations in a data model can improve the performance of the SE query. Aggregations can be defined in both VertiPaq and DirectQuery, providing alternative ways to execute an SE query. When there are aggregations available, the engine tries to rewrite an original SE query into a different one using an aggregation. This rewriting attempt is successful when there is a compatible aggregation. Whenever the rewriting attempt fails because of the lack of a compatible aggregation, the engine executes the original SE query.

DAX Studio can show the rewriting attempts to match an aggregation. These details might be useful to understand why an existing aggregation is not used when this was expected. For example, consider the following query executed in a composite model:

EVALUATE
SUMMARIZECOLUMNS (
    'Date'[Calendar Year],
    "Qty", SUM ( Sales[Quantity] ),
    "Qty Red", CALCULATE (
        SUM ( Sales[Quantity] ),
        'Product'[Color] = "Red"
    )
)

The model has an aggregation for the Sales table with the granularity of Date and Customer. The query computes two expressions for each calendar year: Qty is the sum of the quantity for all the orders made in the reported year, and Qty Red is the quantity for the orders of red products made in the same year. The screenshot in Figure 19-19 shows the SE queries reported by DAX Studio executing the preceding DAX query.

The figure shows the SE queries reported by DAX Studio.
Figure 19-19 Use of aggregations reported by RewriteAttempted events in DAX Studio.

There are two RewriteAttempted subclass events describing the evaluation made by the DAX engine before generating the SE query. The Qty calculation requires a filter by year; this request is compatible with the existing aggregation (which groups by Date and Customer). This is reported in Line 1 and the details about the match found are reported in the details of the event shown in Figure 19-20.

This figure shows the details of the event.
Figure 19-20 A matching aggregation reports the use of aggregations reported by RewriteAttempted events in DAX Studio.

Because the aggregation is a table imported in memory, the engine generates the following VertiPaq SE query reported at Line 3 in Figure 19-19:

SELECT
    'Date'[Calendar Year],
    SUM ( 'Sales_Agg'[Quantity] )
FROM 'Sales_Agg'
    LEFT OUTER JOIN 'Date' ON 'Sales_Agg'[Order Date]='Date'[Date];

The RewriteAttempted event at line 4 in Figure 19-19 does not find a matching aggregation for the Qty Red calculation, which requires a filter by Date and Product. In this case the Sales original table (whose storage is DirectQuery) must be queried directly without using any aggregation, as shown in the details in Figure 19-21.

This figure shows the details of the attempt.
Figure 19-21 Failed matching of aggregations reported by RewriteAttempted events in DAX Studio.

Because the Sales table has a DirectQuery storage, the engine generates an SQL query reported on line 5. The longer duration (more than two seconds) is normal and expected. Aggregations can be considered to improve the performance of DAX queries whose bottleneck is the SE. Aggregations are usually not useful for bottlenecks in the FE.

Reading query plans

At the beginning of this chapter, we described the two types of query plans available in DAX: logical and physical. In reality, we do not use these query plans often because we focus our attention on the SE queries first. We can analyze the performance of the SE queries to find issues caused by the SE and/or by the materialization of large datacaches in memory. SE queries are much easier to read than DAX query plans.

In this section, we describe some of the important behaviors to check in a query plan in order to identify performance bottlenecks. A complete and detailed coverage of all the operators used in logical and physical query plans is beyond the scope of this book. The goal here is to understand the relationships between a query plan and the SE queries, thus improving one’s ability to find bottlenecks and to improve query performance.

A query plan usually generates more than one SE query. The FE combines the results of different datacaches, doing operations like joins between temporary tables. Consider the following DAX query; it returns a table with the quantity sold for each product color, only for transactions with a Net Price greater than 1,000:

EVALUATE
CALCULATETABLE (
    ADDCOLUMNS (
        ALL ( Product[Color] ),
        "Units", CALCULATE (
            SUM ( Sales[Quantity] )
        )
    ),
    Sales[Net Price] > 1000
)
ORDER BY Product[Color]

The result visible in Figure 19-22 includes all the unique values of Color, including those without any unit sold. In order to do that, the approach of the DAX engine is different from the one we would expect in plain SQL language; this is because of the different technique used to join tables in the SE. We will highlight this difference later; pay attention to the process for now.

The figure shows the result of the query, displaying units per color.
Figure 19-22 The result of ADDCOLUMNS includes rows with a blank value in the Units column.

The logical query plan shown in Figure 19-23 includes three Scan_Vertipaq operations, two of which correspond to two datacaches provided by SE queries.

This figure is the query plan of the query.
Figure 19-23 Logical query plan of a simple DAX query.

The two Scan_Vertipaq operations at lines 4 and 6 require different sets of columns. The third Scan_Vertipaq operation at line 9 is used for a filter, and it does not generate a separate datacache. Its logic is included in one of the other two SE queries generated.

The Scan_Vertipaq at line 4 only uses the product color, whereas the Scan_Vertipaq at line 6 includes product color and sales quantity, which are two columns in two different tables. When this happens, a join between two or more tables is required.

After the logical query plan, the profiler receives the events from the SE. The corresponding xmSQL queries are the following:

SELECT
    Product[Color],
    SUM ( Sales[Quantity] )
FROM Sales
    LEFT OUTER JOIN Product ON Sales[ProductKey] = Product[ProductKey]
WHERE Sales[Net Price] > 1000;

SELECT Product[Color] FROM Product;

The first SE query retrieves a table containing one row for each color that has at least one unit sold at a price greater than 1,000 in the Sales table. In order to do that, the query joins Sales and Product using the ProductKey column. The second xmSQL statement returns the list of all the product colors, independent of the Sales table. These two queries generate two different datacaches, one with two columns (product color and sum of quantity) and another with only one column (the product color).

At this point, we might wonder why a second query is required. Why is the first xmSQL not enough? The reason is that the LEFT JOIN in xmSQL has Sales on the left side and Product on the right side. In plain SQL code, we would have written another query:

SELECT
    Product.Color,
    SUM ( Sales.Quantity )
FROM Product
LEFT OUTER JOIN Sales
    ON Sales.ProductKey = Product.ProductKey
WHERE Sales.NetPrice > 1000
GROUP BY Product.Color
ORDER BY Product.Color;

Having the Product table on the left side of a LEFT JOIN would produce a result that includes all the product colors. However, the SE can only generate queries between tables with a relationship in the data model, and the resulting join in xmSQL always puts the table that is on the many-side of the relationship on the left side of the join condition. This guarantees that even though there are missing product keys in the Product table, the result will also include sales for those missing products; these sales will be included in a row with a blank value for all the product attributes, in this case the product color.

Now that we have seen why the DAX engine produces two SE queries for the initial DAX query, we can analyze the physical query plan shown in Figure 19-24, where we can find more information about the query execution.

This figure shows the physical query plan.
Figure 19-24 Physical query plan of a simple DAX query.

The physical query plan uses the Cache operator (line 6 and 9) to indicate where it is consuming a datacache provided by the SE. Unfortunately, it is not possible to see the corresponding SE query for each operation. Nevertheless, at least in simple cases like the one considered, we can figure out this association by looking at other pieces of information. For example, one Cache only has one column obtained with a group operation, whereas the other Cache has two columns: one that is the result of a group operation and the other that is the result of an aggregation (the sum of the quantity). In the physical query plan, #ValueCols reports the number of columns that are the result of an aggregation, whereas #FieldCols reports the number of other columns used to group the result. By looking at the columns consumed by each Cache node, it is often possible to identify the corresponding xmSQL query even though it is a time-consuming process in complex query plans. In this example, the Cache node at line 6 returns a column with 16 product color names; on the other hand, the Cache node at line 9 only returns 10 rows and two columns, only with the product color names that have at least one transaction in Sales within the condition specified for Net Price (which must be greater than 1,000).

The ProjectionSpool<> operation consumes the datacaches corresponding to Cache nodes in the physical query plan. Here we can find an important piece of information: the number of records iterated, which corresponds to the number of rows in the datacache used. This number follows the #Records attribute, which is also reported in the Records column in DAX Studio. We can find the same #Records attribute in parent nodes of the query plan—a place where the type of aggregation performed by the engine is also available if there is one. In this example, the Cache at line 9 has two columns: one is Product[Color] and the other is the result of a sum aggregation. This information is available in the LogOp argument of the Spool_Iterator and SpoolLookup nodes at lines 4 and 7, respectively.

At this point, we can recap what we are reading in the query plans and the SE queries:

  1. The FE consumes two datacaches, corresponding to Cache nodes in the physical query plan.

  2. The FE iterates over the list of product colors, which is a table containing 16 rows and one column. This is the datacache obtained by the second SE query. Do not make assumptions about the order of the SE queries in the profiler.

  3. For each row of this datacache (a product color), the FE executes a lookup in the other datacache containing the product colors and the quantity sold for each color; this is a table with two columns and 10 rows.

The entire process executed by the FE is sequential and single-threaded. The FE sends one request at a time to the SE. The SE might parallelize the query, but the FE does not send multiple requests in parallel to the SE.

Image Note

The FE and the SE are subject to optimizations and improvements made in new releases. The behavior described might be different in newer versions of the DAX engine.

The FE can combine different results by using the lookup operation described in the previous query plan or other set operators. In any case, the FE executes this operation sequentially. For this reason, we might expect longer execution times by combining large datacaches or by performing a lookup for millions of rows in a large lookup datacache. A simple and effective way to identify these potential bottlenecks in the physical query plan is to look for the highest number of records in the operators of a logical query plan. For this reason, DAX Studio extracts that number from the query plan, making it easier to sort query plan operators by using the number of records iterated. It is possible to sort the rows by this number by clicking the Records column shown in Figure 19-24. We will show a more detailed example of this approach in Chapter 20.

The presence of relationships in the data model is important in order to obtain better performance. We can examine the behavior of a join between two tables when a relationship is not available. For example, consider a query returning the same result as the previous example, but operating in a data model that does not have a relationship between the Product and Sales tables. We need a DAX query such as the following; it uses the virtual relationship pattern shown in Chapter 15, “Advanced relationships,” in the section “Transferring a filter using INTERSECT”:

DEFINE
    MEASURE Sales[Units] =
        CALCULATE (
            SUM ( Sales[Quantity] ),
            INTERSECT (
                ALL ( Sales[ProductKey] ),
                VALUES ( 'Product'[ProductKey] )
            ),
            -- Disable the existing relationship between Sales and Product
            CROSSFILTER ( Sales[ProductKey], 'Product'[ProductKey], NONE )
        )
EVALUATE
ADDCOLUMNS (
    ALL ( 'Product'[Color] ),
    "Units", [Units]
)
ORDER BY 'Product'[Color]

The function in the Units measure definition is equivalent to a relationship between Sales and Product. The resulting query plan is more complex than the previous one because there are many more operations in both the logical and the physical query plans. Without doing a dump of the complete query plan, which would be too long for a book, we can summarize the behavior of the query plan in these logical steps:

  1. Retrieves the list of ProductKey values for each product color.

  2. Sums the Quantity value for each ProductKey.

  3. For each color, aggregates the Quantity of the related ProductKey values.

The FE executes four SE queries, as shown in Figure 19-25.

The figure shows four SE queries.
Figure 19-25 SE queries executed for a DAX calculation using a virtual relationship with INTERSECT.

The following are the complete xmSQL statements of the four SE queries:

SELECT
Sales[ProductKey]
FROM Sales;

SELECT
Product[Color]
FROM Product;

SELECT
Product[ProductKey], Product[Color]
FROM Product;

SELECT
Sales[ProductKey], SUM ( Sales[Quantity] )
FROM Sales
WHERE     Sales[ProductKey] IN ( 490, 479, 528, 379, 359, 332, 374, 597, 387,
                                 484..[158 total values, not all displayed] );

The WHERE condition highlighted in the last SE query might seem useless because the DAX query does not apply a filter over products. However, usually in the real world there are other filters active on products or other tables. The query plan tries to only extract the quantities sold of products that are relevant to the query, lowering the size of the datacache returned to the FE. When there are similar WHERE conditions in the SE, the only concern is the size of the corresponding bitmap index moved back and forth between the FE and the SE.

The FE has to group all the products belonging to each color. The performance of this join performed at the FE level mainly depends on the number of products and secondarily on the number of colors. Once again, the size of a datacache is the first and most important element to consider when we look for a performance bottleneck in the FE.

We considered the virtual relationship using INTERSECT for educational purposes. We wanted to display the SE queries required for a join condition resolved mainly by the FE. However, whenever possible, if a physical relationship is not available, TREATAS should be considered as a more optimized alternative. Consider this alternative implementation of the previous DAX query:

DEFINE
    MEASURE Sales[Units] =
        CALCULATE (
            SUM ( Sales[Quantity] ),
            TREATAS (
                VALUES ( 'Product'[ProductKey] ),
                Sales[ProductKey]
            ),
            -- Disable the existing relationship between Sales and Product
            CROSSFILTER ( Sales[ProductKey], 'Product'[ProductKey], NONE )
        )
EVALUATE
ADDCOLUMNS (
    ALL ( 'Product'[Color] ),
    "Units", [Units]
)
ORDER BY 'Product'[Color]

As shown in Figure 19-26, there are only three SE queries generated instead of four. Remember that Batch is just a recap of the previous Scan events. Moreover, the size of the datacaches is smaller because one result alone has 2,517 rows corresponding to the number of products in the Product table. In the previous implementation using INTERSECT, there were a larger number of queries returning thousands of rows. All of these datacaches must be consumed by the FE.

The figure shows three SE queries.
Figure 19-26 SE queries executed for a DAX calculation using a virtual relationship with TREATAS.

The following is the content of the Batch event at line 5, which includes the first two Scan events (lines 2 and 4):

DEFINE TABLE '$TTable3' := SELECT
'Product'[ProductKey], 'Product'[Color]
FROM 'Product',

CREATE SHALLOW RELATION '$TRelation1' MANYTOMANY
FROM 'Sales'[ProductKey] TO '$TTable3'[Product$ProductKey],

DEFINE TABLE '$TTable1' := SELECT
    '$TTable3'[Product$Color],
    SUM ( '$TTable2'[$Measure0] )
FROM '$TTable2'
    INNER JOIN '$TTable3' ON '$TTable2'[Sales$ProductKey]='$TTable3'[Product$ProductKey]
REDUCED BY
'$TTable2' := SELECT
    'Sales'[ProductKey],
    SUM ( 'Sales'[Quantity] ) AS [$Measure0]
FROM 'Sales';

The performance advantage of TREATAS is that it moves the execution of the operation to the SE, thanks to the CREATE SHALLOW RELATION statement highlighted in the previous code. This way, there is no need to materialize more data for the SE. Indeed, the join is executed within the FE, which reduces the number of lines of the physical query plan—from the 37 required by INTERSECT (not displayed in the book for brevity) to the 10 required by TREATAS. This results in a query plan very similar to the one shown in Figure 19-24.

Analyzing complex and longer query plans would require another book, considering the length of the query plans involved. More details about the internals of the query plans are available in the white papers “Understanding DAX Query Plans” (http://www.sqlbi.com/articles/understanding-dax-query-plans/) and “Understanding Distinct Count in DAX Query Plans” (http://www.sqlbi.com/articles/understanding-distinct-count-in-dax-query-plans/).

Conclusions

As you have seen, diving into the complexity of query plans opens up a whole new world. In this chapter we barely scratched the surface of query plans, and a deeper analysis would require twice the size of this book. The good news is that in most—if not all—scenarios, going into more detail turns out to be useless.

An experienced DAX developer who aims to write optimal code should be able to focus their attention on the low-hanging fruit that can be discovered very quickly by looking at the most relevant parts of the query plan:

  • In the physical query plan, the presence of a large number of rows scanned indicates the materialization of large datasets. This suggests that the query is memory-hungry and potentially slow.

  • Most of the time, the VertiPaq queries include enough information to figure out the overall algorithm of the calculation. Whatever is not computed in a VertiPaq query, it must be computed by the formula engine. Knowing this enables you to get a clear idea of the whole query process.

  • CallbackDataID presence indicates iterations at the row level where your code requires calculations that are too complex for VertiPaq storage engine. CallbackDataIDs by themselves are not totally bad. Nevertheless, removing them almost always results in better performance.

  • VertiPaq and DirectQuery models are different. When using DirectQuery, the performance of DAX is strongly connected to the performance of the data source. It makes sense to use DirectQuery if and only if the underlying data source is specifically optimized for the kind of queries generated by the DirectQuery storage engine.

In the next chapter, we are going to use the knowledge gained in this and previous chapters to provide a few guided optimization processes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.222.231.51