Chapter 20. Optimizing DAX

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 20
Optimizing DAX

This is the last chapter of the book, and it is time to use all the knowledge you have gained so far to explore the most fascinating DAX topic: optimizing formulas. You have learned how the DAX engines work, how to read a query plan, and the internals of the formula engine and of the storage engine. Now all the pieces are in place and you are ready to learn how to use that information to write faster code.

There is one very important warning before approaching this chapter. Do not expect to learn best practices or a simple way to write fast code. Simply stated: There is no way in DAX to write code that is always the fastest. The speed of a DAX formula depends on many factors, the most important of which unfortunately is not in the DAX code itself: It is data distribution. You have already learned that VertiPaq compression strongly depends on data distribution. The size of a column (hence, the speed to scan it) depends on its cardinality: the larger, the slower. Thus, the very same formula might behave differently when executed on one column or another.

You will learn how to measure the speed of a formula, and we will provide you with several examples where rewriting the expression differently leads to a faster execution time. Learn all these examples for what they are—examples that might help you in finding new ideas for your code. Do not take them as golden rules, because they are not.

We are not teaching you rules; we are trying to teach you how to find the best rules in the very specific scenario that is your data model. Be prepared to change them when the data model changes or when you approach a new scenario. Flexibility is key when optimizing DAX code: flexibility, a deep technical knowledge of the engine, and a good amount of creativity, to be prepared to test formulas and expressions that might be not so intuitive.

Finally, all the information we provide in this book is valid at the time of printing. New versions of the engine come on the market every month, and the development team is always working on improving the DAX engine. So be prepared to measure different numbers for the examples of the book in the version of the engine you will be running and be prepared to use different optimization methods if necessary. If one day you measure your code and reach the educated conclusion that “Marco and Alberto are wrong; this code runs much faster than their suggested code,” that will be our brightest day, because we will have been able to teach you all that we know, and you are moving forward in writing better DAX code than ours.

Defining optimization strategies

The optimization process for a DAX query, expression, or measure requires a strategy to reproduce a performance issue, identify the bottleneck, and remove it. Initially, you always observe a slowness in a complex query, but optimizing a complicated expression including several DAX measures is more involved than optimizing one measure at a time. For this reason, the approach we suggest is to isolate the slowest measure or expression first, and optimize it in a simpler query that reproduces the issue with a shorter query plan.

This is a simple to-do list you should follow every time you want to optimize DAX:

Identify a single DAX expression to optimize.
Create a query that reproduces the issue.
Analyze server timings and query plan information.
Identify bottlenecks in the storage engine or formula engine.
Implement changes and rerun the test query.

You can see a more complete description of each of these steps in the following sections.

Identifying a single DAX expression to optimize

If you have already found the slowest measure in your model, you probably can skip this section and move to the following one. However, it is common to get a performance issue in a report that might generate several queries. Each of these queries might include several measures. The first step is to identify a single DAX expression to optimize. Doing this, you reduce the reproduction steps to a single query and possibly to a single measure returned in the result.

A complete refresh of a report in Power BI or Reporting Services or of a Microsoft Excel workbook typically generates several queries in either DAX or MDX (PivotTables and charts in Excel always generate the latter). When a report generates several queries, you have to identify the slowest query first. In Chapter 19, “Analyzing DAX query plans,” you saw how DAX Studio can intercept all the queries sent to the DAX engine and identify the slowest query looking at the largest Duration amount.

If you are using Excel, you can also use a different technique to isolate a query. You can extract the MDX query it generates by using OLAP PivotTable Extensions, a free Excel add-in available at https://olappivottableextensions.github.io/.

Once you extract the slowest DAX or MDX query, you have to further restrict your focus and isolate the DAX expression that is causing the slowness. This way, you will concentrate your efforts on the right area. You can reduce the measures included in a query by modifying and executing the query interactively in DAX Studio.

For example, consider the following table result in Power BI with four expressions (two distinct counts and two measures) grouped by product brand, as shown in Figure 20-1.

Figure 20-1 Simple visualization in Power BI generated by a DAX query with four expressions.

The report generates the following DAX query, captured by using DAX Studio:

Table of Contents for Chapter 20. Optimizing DAX

Create new playlist

Sign In

Sign Up

Chapter 20Optimizing DAX

Defining optimization strategies

Identifying a single DAX expression to optimize

Creating a reproduction query

Creating a reproduction query in DAX

Creating query measures with DAX Studio

Creating a reproduction query in MDX

Analyzing server timings and query plan information

Identifying bottlenecks in the storage engine or formula engine

Implementing changes and rerunning the test query

Optimizing bottlenecks in DAX expressions

Optimizing filter conditions

Optimizing context transitions

Optimizing IF conditions

Optimizing IF in measures

Choosing between IF and DIVIDE

Optimizing IF in iterators

Reducing the impact of CallbackDataID

Optimizing nested iterators

Avoiding table filters for DISTINCTCOUNT

Avoiding multiple evaluations by using variables

Conclusions

Table of Contents for
Chapter 20. Optimizing DAX

Chapter 20
Optimizing DAX