Initial Population Script

Now that you have identified your data sources, it’s time to write a script that can be used for initial population. Assuming your data warehouse will start operation on March 1, 2007 and your user wants to load two years of historical data, you need to load the source data dated from March 1, 2005 to February 28, 2007.

The script in Listing 7.1 serves as an example. Note that:

  • The customer dimension’s and the product dimension’s effective date is March 1, 2007. No sales order loaded has an earlier date, meaning no earlier customer and product dimensions are required.

  • The effective date of the order dimension is, of course, the order date.

  • The surrogate key columns of the sales order fact table are populated from the dimension surrogate keys.

  • You need to pre-populate the date_dim table separately using pre-population with dates from March 1, 2005 to, say, December 31, 2010.

Listing 7.1. DW initial population
/******************************************************************/
/*                                                                */
/* dw_initial.sql                                                 */
/*                                                                */
/******************************************************************/

USE dw;

LOAD DATA INFILE 'customer.csv'
INTO TABLE customer_stg
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '
'
IGNORE 1 LINES
( customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state )
;

INSERT INTO customer_dim
SELECT
  NULL
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, '2005-03-01'
, '9999-12-31'
FROM
customer_stg
;

LOAD DATA INFILE 'product.txt'
INTO TABLE product_stg
FIELDS TERMINATED BY ''
OPTIONALLY ENCLOSED BY ''
LINES TERMINATED BY '
'
IGNORE 1 LINES
( product_code
, product_name
, product_category )
;

INSERT INTO product_dim
SELECT
  NULL
, product_code
, product_name
, product_category
, '2005-03-01'
, '9999-12-31'
FROM
product_stg
;

INSERT INTO order_dim
SELECT
  NULL
, order_number
, order_date
, '9999-12-31'
FROM
source.sales_order
WHERE order_date >= '2005-03-01'
AND order_date < '2007-02-28'
;

INSERT INTO sales_order_fact
SELECT
  order_sk
, customer_sk
, product_sk
, date_sk
, order_amount
FROM
  source.sales_order a
, order_dim b
, customer_dim c
, product_dim d
, date_dim e
WHERE
    a.order_number = b.order_number
AND a.customer_number = c.customer_number
AND a.product_code = d.product_code
AND a.order_date = e.date
AND order_date >= '2005-03-01'
AND order_date < '2007-02-28'
;

/* end of script                                                  */

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.4.181