Source Data

I explain in this section what key aspects of source data you need to identify before you can begin writing the initial population script.

The first step is to identify what source data is needed and available for every fact and every dimension of the data warehouse. You also need to know the characteristics of the data source, such as its file type, record structure, and accessibility.

A sample document that summarizes the key information of the source data for the sales order data warehouse, including the file types, formats, and data warehouse target tables, is shown in Table 7.1. This type of table is often called the data source map because it maps every data from the source to the target. The activity to produce the table is called data source mapping.

Table 7.1. Your data warehouse’s data sources
Source DataData TypeFile/Table NameTarget Table in Data Warehouse
CustomerCSV text filecustomer.csvcustomer_dim
ProductFixed-width text fileproduct.txtproduct_dim
Sales order transactionMySQLsales_order in source databaseorder_dim
   sales_order_fact
   date_dim, if you use “Loading the Date from Source” option, but you’ll use pre-population here

Note

The source data for customer and product dimensions map directly to their target data warehouse tables, the customer_dim and product_dim tables. On the other hand, the sales order transaction is the source for more than one data warehouse table. You learn a more complex source data in Chapter 20, “Non-Straight Sources.”


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.4.181