Compression and query performance

Without any further hesitation, let's start with the code. The first thing is to drop the filtered (empty) NCCI created in the previous section:

DROP INDEX NCCI_FactTest 
ON dbo.FactTest;  
GO

The test fact table is organized as a B-tree with no additional nonclustered index, neither a rowstore nor columnstore one. The clustering key is the date. In order to make a comparison to NCCIs, let's set a baseline. First, recheck the space used by the test fact table:

EXEC sys.sp_spaceused N'dbo.FactTest', @updateusage = N'TRUE'; 
GO

The result is as follows:

Name          rows     reserved   data       index_size  unused 
------------  -------  ---------  ---------  ----------  -------
dbo.FactTest  2279810  502152 KB  498528 KB  2072 KB     1552 KB

You can measure IO with the SET STATISTICS IO ON command. In addition, you can turn on the actual execution plan. Here is the first sample query; let's call it the simple query:

SET STATISTICS IO ON; 
SELECT f.StockItemKey, 
 SUM(f.TotalAmount) AS Sales 
FROM dbo.FactTest AS f 
WHERE f.StockItemKey < 30 
GROUP BY f.StockItemKey 
ORDER BY f.StockItemKey;

The query did a full clustered index scan, and there were 63,601 logical reads. You may also notice in the execution plan that only row mode operators were used.

You can get slightly different results for the IO. The exact number of pages used by the table might vary slightly based on your dataset file organization, parallel operations when you load the data or change the table from a heap to a B-tree, and other possibilities. Nevertheless, your numbers should be very similar, and the point is to show how much less space will be used by columnar storage because of the compression.

The next query involves multiple joins; let's call it the complex query:

SELECT f.SaleKey, 
  f.CustomerKey, f.Customer, cu.[Buying Group], 
  f.CityKey, f.City, ci.Country, 
  f.DateKey, d.[Calendar Year], 
  f.StockItemKey, f.Product,  
  f.Quantity, f.TotalAmount, f.Profit 
FROM dbo.FactTest AS f 
  INNER JOIN Dimension.Customer AS cu 
    ON f.CustomerKey = cu.[Customer Key] 
  INNER JOIN Dimension.City AS ci 
    ON f.CityKey = ci.[City Key] 
  INNER JOIN Dimension.[Stock Item] AS s 
    ON f.StockItemKey = s.[Stock Item Key] 
  INNER JOIN Dimension.Date AS d 
    ON f.DateKey = d.Date;

This time, SQL Server created a much more complex execution plan, yet SQL Server used a full clustered index scan to read the data from the test fact table. SQL Server used 62,575 logical reads in this table.

The third test query is very selective—it selects only the rows for customer 378. If you remember, this customer has only 242 rows in the fact table. Let's call the third query the point query:

SELECT CustomerKey, Profit 
FROM dbo.FactTest 
WHERE CustomerKey = 378; 
SET STATISTICS IO OFF;

The query again did a full clustered index scan, and there were 63,601 logical reads.

Table of Contents for Compression and query performance

Create new playlist

Sign In

Sign Up

Table of Contents for
Compression and query performance