In this section I explain how to handle late arrival sales orders when populating the month_end_sales_order_fact table.
First of all, in order for you to know if a sales order is late, you need to load the entry date from the sales order source to the sales_order_fact table. Since you don’t have a column for the entry date, you need to add a new one to the fact table. Similar to adding the request delivery date in Chapter 13, “Dimension Role Playing,” you apply the dimension role playing technique to add the entry date. Therefore, add a date surrogate key column in the sales order fact table named entry_date_sk and create a database view from the date dimension table called entry_date_dim. The script in Listing 22.1 creates the entry_date_dim view and add the entry_date_sk surrogate key column in the sales_order_fact table.
Note
You might want to review the dimension role-playing technique in Chapter 13 to understand why you need to add the surrogate key column and create a database view.
/******************************************************************/ /* */ /* entry_date.sql */ /* */ /******************************************************************/ USE dw; CREATE VIEW entry_date_dim ( entry_date_sk , entry_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date ) AS SELECT date_sk , date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date FROM date_dim ; ALTER TABLE sales_order_fact ADD entry_date_sk INT AFTER receive_date_sk ; /* end of script */ |
To run the script in Listing 22.1, type in the following command into the MySQL console.
mysql> . c:mysqlscriptsentry_date.sql
You will see the following as the response.
Database changed Query OK, 0 rows affected (0.02 sec) Query OK, 59 rows affected (0.49 sec) Records: 59 Duplicates: 0 Warnings: 0
After creating the entry_date_dim view and adding the entry_date_sk column to the sales_order_fact table, you now need to revise the data warehouse regular population script to include the entry date. Listing 22.2 shows the revised regular population script. Note that the sales_order source already contains entry dates, but we did not previously load it into the data warehouse.
/******************************************************************/ /* */ /* dw_regular_22.sql */ /* */ /******************************************************************/ USE dw; /* CUSTOMER_DIM POPULATION */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY ' ' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state ) ; /* SCD 2 ON ADDRESSES */ UPDATE customer_dim a , customer_stg b SET a.expiry_date = SUBDATE(CURRENT_DATE, 1) WHERE a.customer_number = b.customer_number AND ( a.customer_street_address <> b.customer_street_address OR a.customer_city <> b.customer_city OR a.customer_zip_code <> b.customer_zip_code OR a.customer_state <> b.customer_state OR a.shipping_address <> b.shipping_address OR a.shipping_city <> b.shipping_city OR a.shipping_zip_code <> b.shipping_zip_code OR a.shipping_state <> b.shipping_state OR a.shipping_address IS NULL OR a.shipping_city IS NULL OR a.shipping_zip_code IS NULL OR a.shipping_state IS NULL) AND expiry_date = '9999-12-31' ; INSERT INTO customer_dim SELECT NULL , b.customer_number , b.customer_name , b.customer_street_address , b.customer_zip_code , b.customer_city , b.customer_state , b.shipping_address , b.shipping_zip_code , b.shipping_city , b.shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_dim a , customer_stg b WHERE a.customer_number = b.customer_number AND ( a.customer_street_address <> b.customer_street_address OR a.customer_city <> b.customer_city OR a.customer_zip_code <> b.customer_zip_code OR a.customer_state <> b.customer_state OR a.shipping_address <> b.shipping_address OR a.shipping_city <> b.shipping_city OR a.shipping_zip_code <> b.shipping_zip_code OR a.shipping_state <> b.shipping_state OR a.shipping_address IS NULL OR a.shipping_city IS NULL OR a.shipping_zip_code IS NULL OR a.shipping_state IS NULL) AND EXISTS ( SELECT * FROM customer_dim x WHERE b.customer_number = x.customer_number AND a.expiry_date = SUBDATE(CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM customer_dim y WHERE b.customer_number = y.customer_number AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2 */ /* SCD 1 ON NAME */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE a.customer_number = b.customer_number AND a.expiry_date = '9999-12-31' AND a.customer_name <> b.customer_name ; /* ADD NEW CUSTOMER */ INSERT INTO customer_dim SELECT NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN( SELECT a.customer_number FROM customer_dim a , customer_stg b WHERE b.customer_number = a.customer_number ) ; /* RE-BUILD PA CUSTOMER DIMENSION */ TRUNCATE pa_customer_dim; INSERT INTO pa_customer_dim SELECT customer_sk , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date FROM customer_dim WHERE customer_state = 'PA' ; /* END OF CUSTOMER_DIM POPULATION */ /* PRODUCT_DIM POPULATION */ TRUNCATE product_stg ; LOAD DATA INFILE 'product.txt' INTO TABLE product_stg FIELDS TERMINATED BY '' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY ' ' IGNORE 1 LINES ( product_code , product_name , product_category ) ; /* SCD2 ON PRODUCT NAME AND GROUP */ UPDATE product_dim a , product_stg b SET expiry_date = SUBDATE(CURRENT_DATE, 1) WHERE a.product_code = b.product_code AND ( a.product_name <> b.product_name OR a.product_category <> b.product_category ) AND expiry_date = '9999-12-31' ; INSERT INTO product_dim SELECT NULL , b.product_code , b.product_name , b.product_category , CURRENT_DATE , '9999-12-31' FROM product_dim a , product_stg b WHERE a.product_code = b.product_code AND ( a.product_name <> b.product_name OR a.product_category <> b.product_category ) AND EXISTS ( SELECT * FROM product_dim x WHERE b.product_code = x.product_code AND a.expiry_date = SUBDATE(CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM product_dim y WHERE b.product_code = y.product_code AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2 */ /* ADD NEW PRODUCT */ INSERT INTO product_dim SELECT NULL , product_code , product_name , product_category , CURRENT_DATE , '9999-12-31' FROM product_stg WHERE product_code NOT IN( SELECT y.product_code FROM product_dim x, product_stg y WHERE x.product_code = y.product_code ) ; /* END OF PRODUCT_DIM POPULATION */ /* PRODUCT_COUNT_FACT POPULATION */ TRUNCATE product_count_fact ; INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT a.product_sk , b.date_sk FROM product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT(product_code) = 1 ; /* for products that have been updated by SCD2 */ INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT a.product_sk , b.date_sk FROM product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT(product_code) > 1 ; /* END OF PRODUCT_COUNT_FACT POPULATION */ /* INSERT NEW ORDERS */ INSERT INTO sales_order_fact SELECT b.customer_sk , c.product_sk , f.sales_order_attribute_sk , d.order_date_sk , NULL , NULL , NULL , NULL , g.entry_date_sk , a.order_number , e.request_delivery_date_sk , order_amount , quantity , NULL , NULL , NULL , NULL FROM source.sales_order a , customer_dim b , product_dim c , order_date_dim d , request_delivery_date_dim e , sales_order_attribute_dim f , entry_date_dim g WHERE order_status = 'N' AND a.entry_date = CURRENT_DATE AND a.customer_number = b.customer_number AND a.status_date >= b.effective_date AND a.status_date <= b.expiry_date AND a.product_code = c.product_code AND a.status_date >= c.effective_date AND a.status_date <= c.expiry_date AND a.status_date = d.order_date AND a.entry_date = g.entry_date AND a.request_delivery_date = e.request_delivery_date AND a.verification_ind = f.verification_ind AND a.credit_check_flag = f.credit_check_flag AND a.new_customer_ind = f.new_customer_ind AND a.web_order_flag = f.web_order_flag AND a.status_date >= f.effective_date AND a.status_date <= f.expiry_date ; UPDATE sales_order_fact a , source.sales_order b , allocate_date_dim c SET a.allocate_date_sk = c.allocate_date_sk , a.allocate_quantity = b.quantity WHERE order_status = 'A' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND c.allocate_date = b.status_date ; UPDATE sales_order_fact a , source.sales_order b , packing_date_dim d SET a.packing_date_sk = d.packing_date_sk , a.packing_quantity = b.quantity WHERE order_status = 'P' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND d.packing_date = b.status_date ; UPDATE sales_order_fact a , source.sales_order b , ship_date_dim e SET a.ship_date_sk = e.ship_date_sk , a.ship_quantity = b.quantity WHERE order_status = 'S' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND e.ship_date = b.status_date ; UPDATE sales_order_fact a , source.sales_order b , receive_date_dim f SET a.receive_date_sk = f.receive_date_sk , a.receive_quantity = b.quantity WHERE order_status = 'R' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND f.receive_date = b.status_date ; /* end of script */ |
You must also revise the month end population script. The revised script, presented in listing 22.3, has three parts. The first part handles sales orders that are not late. The second part adds incoming sales amounts to the existing rows that have the same product and month as the incoming sales orders. The third part adds new rows for the incoming sales orders.
/******************************************************************/ /* */ /* month_end_sales_order_22.sql */ /* */ /******************************************************************/ USE dw; /* normal (order_date = entry_date) */ INSERT INTO month_end_sales_order_fact SELECT d.month_sk , a.product_sk , SUM(order_amount) , SUM(order_quantity) FROM sales_order_fact a , order_date_dim b , entry_date_dim c , month_dim d WHERE a.order_date_sk = b.order_date_sk AND a.entry_date_sk = c.entry_date_sk AND b.order_date = c.entry_date AND c.month = MONTH(CURRENT_DATE) AND c.year = YEAR(CURRENT_DATE) AND b.month = d.month AND b.year = d.year GROUP BY d.month, d.year, product_sk ; /* late arrival, amount & quantity already exist for the past months */ UPDATE month_end_sales_order_fact a , (SELECT y.month , y.year , w.product_sk , SUM(order_amount) order_amount , SUM(order_quantity) order_quantity FROM sales_order_fact x , order_date_dim y , entry_date_dim z , product_dim w WHERE x.order_date_sk = y.order_date_sk AND x.entry_date_sk = z.entry_date_sk AND order_date <> entry_date AND MONTH(entry_date) = MONTH(CURRENT_DATE) AND YEAR(entry_date) = YEAR(CURRENT_DATE) AND x.product_sk = w.product_sk GROUP BY y.month, y.year, product_sk) b , month_dim c SET month_order_amount = month_order_amount + b.order_amount , month_order_quantity = month_order_quantity + b.order_quantity WHERE a.month_order_sk = c.month_sk AND b.month = c.month AND b.year = c.year AND a.product_sk = b.product_sk ; /* late arrival but amount & quantity not exist for the past months */ INSERT INTO month_end_sales_order_fact SELECT d.month_sk , a.product_sk , SUM(order_amount) , SUM(order_quantity) FROM sales_order_fact a , order_date_dim b , entry_date_dim c , month_dim d WHERE a.order_date_sk = b.order_date_sk AND a.entry_date_sk = c.entry_date_sk AND b.order_date <> c.entry_date AND c.month = MONTH(CURRENT_DATE) AND c.year = YEAR(CURRENT_DATE) AND b.month = d.month AND b.year = d.year AND NOT EXISTS (SELECT * FROM month_end_sales_order_fact p , sales_order_fact q , month_dim s WHERE p.month_order_sk = s.month_sk AND s.month = d.month AND s.year = d.year AND p.product_sk = a.product_sk ) GROUP BY d.month , d.year , a.product_sk ; /* end of script */ |
18.222.167.178