Good structure – star schemas (Data Mart/Data Warehouse)

Assuming they are well-designed, star schema data models work very well with Tableau because they have well-defined granularity, measures, and dimensions. Additionally, if they are implemented well, they can be extremely efficient to query. This allows for a good experience when using live connections in Tableau.

Star schemas are so named because they consist of a single fact table surrounded by related dimension tables, thus forming a star pattern. Fact tables contain measures at a meaningful granularity, while dimension tables contain attributes for various related entities. The following diagram illustrates a simple star schema with a single fact table (Hospital Visit) and three dimension tables (Patient, Primary Physician, and Discharge Details):

Fact tables are joined to the related dimension using what is often called a surrogate key or foreign key that references a single dimension record. The fact table defines the level of granularity and contains measures. In this case, Hospital Visit has a granularity of one record for each visit. Each visit, in this simple example, is for one patient who saw one primary physician and was discharged. The Hospital Visit table explicitly stores a measure of Visit Duration and implicitly defines another measure of Number of Visits (in this case, Number of Records).

Data modeling purists would point out that date values have been stored in the fact table (and even some of the dimensions) and would instead recommend having a date dimension table with extensive attributes for each date, and only a surrogate (foreign) key stored in the fact table.

A date dimension can be very beneficial. However, Tableau's built-in date hierarchy and extensive date options make storing a date in the fact table a viable option. Consider using a date dimension if you need specific attributes of dates that are not available in Tableau (for example, which days are corporate holidays), have complex fiscal years, or if you need to support legacy BI reporting tools.

A well-designed star schema allows for the use of inner joins since every surrogate key should reference a single dimension record. In cases where dimension values are not known or not applicable, special dimension records are used. For example, a hospital visit that is not yet complete (the patient is still in the hospital) may reference a special record in the Discharge Details table marked as Not yet discharged. When connecting to a star schema in Tableau, start with the fact table and then add the dimension tables, as shown here:

The resulting data connection (shown as an example, but not included in the Chapter 09 workbook) allows you to see the dimensional attributes by table. The measures come from the single fact table:

Well-implemented star schemas are particularly attractive for use in live connections because Tableau can gain performance by implementing join culling.

Join culling is Tableau's elimination of unnecessary joins in queries, since it sends them to the data source engine. For example, if you were to place the Physician Name on Rows and the average of Visit Duration on Columns to get a bar chart of average visit duration per physician, then joins to the Treatment and Patient tables may not be needed. Tableau will eliminate unnecessary joins as long as you are using a simple star schema with joins that are only from the central fact table and have referential integrity enabled in the source or allow Tableau to assume referential integrity (select the data source connection from the Data menu or use the context menu from the data source connection and choose Assume Referential Integrity).

Having considered some examples of good structure, let's turn our attention to handling poorly structured data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.181