Columnar compression

It's important to understand the columnar layout and internal storage of the import mode datasets. Power BI creates individual segments of approximately one million rows and stores separate memory structures for column data, the dictionary of unique values for columns, relationships, and hierarchies.

In the following diagram, three segments are used to store a fact table of 2.8 million rows:

Columnar storage of import mode datasets

Since only the columns required for a query are scanned during query execution, a relatively expensive column in terms of memory consumption (due to many unique values), such as Order #, can be stored in the dataset without negatively impacting queries that only access other columns. Removing fact table columns or reducing the cardinality of fact table columns that are not used in queries or relationships will nonetheless benefit the storage size and resources required to refresh the dataset. Fewer fact table columns may also enable Power BI to find a more optimal sort order for compression and thus benefit the query performance.

Eliminate any DAX-calculated column on fact tables as these columns are not compressed as efficiently as imported columns. If necessary, replace DAX-calculated columns with the equivalent expression in the source M Query or SQL View. Additionally, per the Fact table columns section earlier in this chapter, remove columns that can be computed within DAX Measures via simple expressions (+,-,/,*). For example, the Sales column from the Columnar Storage example image can be excluded from the Import dataset given the Price and Qty columns.

During query execution over tables with more than one segment, one CPU thread is associated per segment. This parallelization is limited by the number of CPU threads available to the dataset (for example, Power BI Premium P1 with four backend v-cores), and the number of segments required to resolve the query. Therefore, ideally, the rows of fact tables can be ordered such that only a portion of the segments are required to resolve queries. Using the example of the 2.8M-row fact table, a query that's filtered on the year 2017 would only require one CPU thread and would only scan the required column segments within Segment 3.

The internal order of fact table rows cannot be dictated by the dataset designer as Power BI determines the optimal order that will lead to the highest compression during dataset refreshes. However, dataset designers can add a sorting transformation to the M query of a fact table (Table.Sort()) such that Power BI will, at a minimum, consider this particular order during its processing. Whether Power BI used the particular sort order can be determined by analyzing the memory footprint of the sorted column before and after the data is loaded. If the size of the sort by column is significantly reduced following the refresh operation, Power BI took advantage of the order by.

Table of Contents for Columnar compression

Create new playlist

Sign In

Sign Up

Table of Contents for
Columnar compression