SQL Server 2016 and 2017 temporal tables and data warehouses

For analytical purposes, data warehouses have evolved. Data warehouses support historical data. You should not confuse a data warehouse's historical data with temporal data, the subject of this chapter. In a data warehouse, historical data means just archived non-temporal data from the past; a typical data warehouse holds from 5 to 10 years of data for analytical purposes. Data warehouses are not suitable for business applications because a data warehouse typically has no constraints.

Data warehouses have a simplified data model that consists of multiple star schemas. You can see a typical star schema in the following figure:

Star schema

One schema covers one business area. It consists of a single central table, called the fact table, and multiple surrounding tables, called dimensions. The fact table is always on the many side every single relationship, with every single dimension table. Star schema is deliberately denormalized. The fact table includes measures, while dimensions give context to those measures. Shared dimensions connect star schemas in a data warehouse.

Dimensions can change over time. The pace of the changes is usually slow compared to the pace of the changes in fact tables. Transactional systems often show the current state only, and don't preserve the history. In a data warehouse, you might need to preserve the history. This is known as the slowly changing dimensions (SCD) problem, which has two common solutions:

  • A type 1 solution means not preserving the history in the data warehouse by simply overwriting the values when the values in the sources change
  • A type 2 solution means adding a new row for a change, and marking which row is the current one

You can also mix both types; for some attributes, like the city in the example, you use type 2, while for some, like occupation, you use type 1. Note the additional problem: when you update the OCCUPATION attribute, you need to decide whether to update the current row only or also the historical rows that come from the type 2 changes. You can see these possibilities in the following figure:

Slowly changing dimensions

On first glance, SQL Server system-versioned tables might be the solution to the SCD problem when using the type 2 implementation. However, this is typically not true. In a data warehouse, most of the time you need to implement a mixed solution: type 1 for some attributes and type 2 for others. The granularity of the time points in system-versioned tables is 100 nanoseconds; in a data warehouse, typical granularity is 1 day. In the source system, the same entity, like customer, can be updated multiple times per day. You can have multiple historical rows for the same entity in a single day. Therefore, when transferring the data from a transactional database to a data warehouse, you need to take care to transfer the last state for each day. The following query illustrates the problem:

USE WideWorldImporters; 
SELECT PersonID, FullName, 
 ValidFrom, ValidTo 
FROM Application.People 
 FOR SYSTEM_TIME ALL 
WHERE IsEmployee = 1 
  AND PersonID = 14; 

In the WideWorldImporters database, Application.People is a system-versioned table. The previous query returns all rows for an employee called Lily Code. Here is the abbreviated result:

You can see that this person has multiple rows for a single date. For example, there are two rows for Lily where the ValidTo date (just the date part) equals 2013-01-01. You need to select only the last row per employee per day. This is done with the following query. You can run it and check the results:

WITH PersonCTE AS 
( 
SELECT PersonID, FullName, 
 CAST(ValidFrom AS DATE) AS ValidFrom, 
 CAST(ValidTo AS DATE) AS ValidTo, 
 ROW_NUMBER() OVER(PARTITION BY PersonID, CAST(ValidFrom AS Date) 
                 ORDER BY ValidFrom DESC) AS rn 
FROM Application.People 
 FOR SYSTEM_TIME ALL 
WHERE IsEmployee = 1 
) 
SELECT PersonID, FullName, 
 ValidFrom, ValidTo 
FROM PersonCTE 
WHERE rn = 1; 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.106.150