Data model optimizations

For many data models, particularly those that were developed as part of pilot projects or by business users, a number of modifications can be implemented to reduce resource requirements or improve query performance. Therefore, prior to concluding that a certain amount of Premium capacity (or Analysis Services resources) is required, data models can be evaluated against a number of standard design practices and optimization techniques such as the following:

Avoid duplicate or near-duplicate data models:
- Design and maintain a consolidated, standardized data model of fact and dimension tables.
Remove tables and columns that aren't needed by the model:
- For import mode models, columns with the unique values (cardinality) will be the most expensive to store and scan at query time.
- The Fact table columns section of Chapter 3, Designing Import and DirectQuery Data Models provides examples of avoiding derived columns that, for import mode models, can be efficiently implemented via DAX measures.
Reduce the precision and cardinality of columns when possible:
- If four digits to the right of the decimal place are sufficient precision, revise a column's data type from a Decimal number to a Fixed decimal number (19, 4):
  - Apply rounding if even less precision is required.
- Split columns containing multiple values such as a datetime column into separate columns (date and time).
Limit or avoid high cardinality relationships, such as dimension tables with over 1.5 million rows:
- Consider splitting very large dimension tables into two tables and defining relationships between these tables and the fact table. The less granular table (such as Product Subcategory grain) could support most reports while the more granular table (such as Product) could be used only when this granularity is required.
Only use iterating DAX functions such as SUMX(), RANKX(), and FILTER() when either the table iterated over is small or when the row expression for these functions can be executed by the storage engine:
- Simple expressions such as the multiplication of two columns from the table being iterated over can be executed by the storage engine.
Use whole number (integer) data types instead of text data types whenever possible.
If the data model uses a DirectQuery data source, optimize this source such as with indexes or columnar technologies available such as the Clustered Columnstore Index for SQL Server:
- Additionally, ensure that the source database supports referential integrity and that the DirectQuery model assumes referential integrity in its defined relationships. This will result in inner join queries to the source.
  - The Fact-to-dimension relationships section of Chapter 3, Designing Import and DirectQuery Data Models, contains additional details.
Avoid or limit DISTINCTCOUNT() measures against high cardinality columns:
- For example, create the DISTINCTCOUNT() measure expression against the natural key or business key column identifying the dimension member (such as Customer ABC), rather than the columns used in the fact-to-dimension relationship. With slowly changing dimension processes, the relationship columns could store many more unique values per dimension member and thus reduce performance.
Avoid the use of calculated DAX columns on fact tables:
- Create these columns in the source system or in the queries used to load the model to allow for better data compression.
- For DirectQuery models, avoid the use of DAX calculated columns for all tables.

Table of Contents for Data model optimizations

Create new playlist

Sign In

Sign Up

Table of Contents for
Data model optimizations