Nonclustered indexes in analytical scenarios

As mentioned, DW queries typically involve large scans of data and aggregation. Very selective seeks are not common for reports from a DW. Therefore, nonclustered indexes generally don't help DW queries much. However, this does not mean that you shouldn't create any nonclustered indexes in your DW.

An attribute of a dimension is not a good candidate for a nonclustered index key. Attributes are used for pivoting and typically contain only a few distinct values. Therefore, queries that filter based on attribute values are usually not very selective. Nonclustered indexes on dimension attributes are not good practice.

DW reports can be parameterized. For example, a DW report could show sales for all customers or for only a single customer, based perhaps on parameter selection by an end user. For a single-customer report, the user would choose the customer by selecting that customer's name. Customer names are selective, meaning that you retrieve only a small number of rows when you filter by customer name. Company names, for example, are typically unique, so by filtering by a company name, you typically retrieve a single row. For reports like these, having a nonclustered index on a name column or columns could lead to better performance.

You can create a filtered nonclustered index. A filtered index spans a subset of column values only and thus applies to a subset of table rows. Filtered nonclustered indexes are useful when some values in a column occur rarely, whereas other values occur frequently. In such cases, you would create a filtered index over the rare values only. SQL Server uses this index for seeks of rare values but performs scans for frequent values. Filtered nonclustered indexes can be useful not only for name columns and member properties but also for attributes of a dimension, and even foreign keys of a fact table. For example, in our demo fact table, the customer with an ID equal to 378 has only 242 rows. You can execute the following code to show that even if you select data for this customer only, SQL Server performs a full scan:

SET STATISTICS IO ON; 
-- All rows 
SELECT * 
FROM dbo.FactTest; 
-- Customer 378 only 
SELECT * 
FROM dbo.FactTest 
WHERE CustomerKey = 378; 
SET STATISTICS IO OFF; 

Both queries needed 6,091 logical reads. Now you can add a filtered nonclustered index to the table:

CREATE INDEX NCLF_FactTest_C378 
 ON dbo.FactTest(CustomerKey) 
 WHERE CustomerKey = 378; 
GO 

If you execute the same two queries again, you get much less IO for the second query. It needed 752 logical reads in my case and uses the Index Seek and Key Lookup operators.

You can drop the filtered index when you don't need it anymore with the following code:

DROP INDEX NCLF_FactTest_C378 
 ON dbo.FactTest; 
GO 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.2.225