Handling Late Arrival Facts

In this section I explain how to handle late arrival sales orders when populating the month_end_sales_order_fact table.

First of all, in order for you to know if a sales order is late, you need to load the entry date from the sales order source to the sales_order_fact table. Since you don’t have a column for the entry date, you need to add a new one to the fact table. Similar to adding the request delivery date in Chapter 13, “Dimension Role Playing,” you apply the dimension role playing technique to add the entry date. Therefore, add a date surrogate key column in the sales order fact table named entry_date_sk and create a database view from the date dimension table called entry_date_dim. The script in Listing 22.1 creates the entry_date_dim view and add the entry_date_sk surrogate key column in the sales_order_fact table.

Note

You might want to review the dimension role-playing technique in Chapter 13 to understand why you need to add the surrogate key column and create a database view.


Listing 22.1. Adding the entry_date column
/******************************************************************/
/*                                                                */
/* entry_date.sql                                                 */
/*                                                                */
/******************************************************************/

USE dw;

CREATE VIEW entry_date_dim (
  entry_date_sk
, entry_date
, month_name
, month
, quarter
, year
, promo_ind
, effective_date
, expiry_date
)
AS SELECT
  date_sk
, date
, month_name
, month
, quarter
, year
, promo_ind
, effective_date
, expiry_date
FROM date_dim
;

ALTER TABLE sales_order_fact
ADD entry_date_sk INT AFTER receive_date_sk
;

/* end of script                                                  */

To run the script in Listing 22.1, type in the following command into the MySQL console.

mysql> . c:mysqlscriptsentry_date.sql

You will see the following as the response.

Database changed
Query OK, 0 rows affected (0.02 sec)

Query OK, 59 rows affected (0.49 sec)
Records: 59  Duplicates: 0  Warnings: 0

After creating the entry_date_dim view and adding the entry_date_sk column to the sales_order_fact table, you now need to revise the data warehouse regular population script to include the entry date. Listing 22.2 shows the revised regular population script. Note that the sales_order source already contains entry dates, but we did not previously load it into the data warehouse.

Listing 22.2. The revised daily DW regular population
/******************************************************************/
/*                                                                */
/* dw_regular_22.sql                                              */
/*                                                                */
/******************************************************************/

USE dw;

/* CUSTOMER_DIM POPULATION                                        */

TRUNCATE customer_stg;

LOAD DATA INFILE 'customer.csv'
INTO TABLE customer_stg
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '
'
IGNORE 1 LINES
( customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state )
;

/* SCD 2 ON ADDRESSES                                             */

UPDATE
  customer_dim a
, customer_stg b
SET
  a.expiry_date = SUBDATE(CURRENT_DATE, 1)
WHERE
  a.customer_number = b.customer_number
AND (   a.customer_street_address <> b.customer_street_address
     OR a.customer_city <> b.customer_city
     OR a.customer_zip_code <> b.customer_zip_code
     OR a.customer_state <> b.customer_state
     OR a.shipping_address <> b.shipping_address
     OR a.shipping_city <> b.shipping_city
     OR a.shipping_zip_code <> b.shipping_zip_code
     OR a.shipping_state <> b.shipping_state
     OR a.shipping_address IS NULL
     OR a.shipping_city IS NULL
     OR a.shipping_zip_code IS NULL
     OR a.shipping_state IS NULL)
AND expiry_date = '9999-12-31'
;

INSERT INTO customer_dim
SELECT
  NULL
, b.customer_number
, b.customer_name
, b.customer_street_address
, b.customer_zip_code
, b.customer_city
, b.customer_state
, b.shipping_address
, b.shipping_zip_code
, b.shipping_city
, b.shipping_state
, CURRENT_DATE
, '9999-12-31'
FROM
  customer_dim a
, customer_stg b
WHERE
    a.customer_number = b.customer_number
AND (   a.customer_street_address <> b.customer_street_address
     OR a.customer_city <> b.customer_city
     OR a.customer_zip_code <> b.customer_zip_code
     OR a.customer_state <> b.customer_state
     OR a.shipping_address <> b.shipping_address
     OR a.shipping_city <> b.shipping_city
     OR a.shipping_zip_code <> b.shipping_zip_code
     OR a.shipping_state <> b.shipping_state
     OR a.shipping_address IS NULL
     OR a.shipping_city IS NULL
     OR a.shipping_zip_code IS NULL
     OR a.shipping_state IS NULL)
AND EXISTS (
SELECT *
FROM customer_dim x
WHERE b.customer_number = x.customer_number
AND a.expiry_date = SUBDATE(CURRENT_DATE, 1))
AND NOT EXISTS (
SELECT *
FROM customer_dim y
WHERE     b.customer_number = y.customer_number
      AND y.expiry_date = '9999-12-31')
;

/* END OF SCD 2                                                   */

/* SCD 1 ON NAME                                                  */

UPDATE customer_dim a, customer_stg b
SET a.customer_name = b.customer_name
WHERE     a.customer_number = b.customer_number
      AND a.expiry_date = '9999-12-31'
      AND a.customer_name <> b.customer_name
;

/* ADD NEW CUSTOMER                                               */

INSERT INTO customer_dim
SELECT
  NULL
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, CURRENT_DATE
, '9999-12-31'
FROM customer_stg
WHERE customer_number NOT IN(
SELECT a.customer_number
FROM
  customer_dim a
, customer_stg b
WHERE b.customer_number = a.customer_number )
;

/* RE-BUILD PA CUSTOMER DIMENSION                                 */

TRUNCATE pa_customer_dim;

INSERT INTO pa_customer_dim
SELECT
  customer_sk
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, effective_date
, expiry_date
FROM customer_dim
WHERE customer_state = 'PA'
;

/* END OF CUSTOMER_DIM POPULATION                                 */
/* PRODUCT_DIM POPULATION                                         */

TRUNCATE product_stg
;

LOAD DATA INFILE 'product.txt'
INTO TABLE product_stg
FIELDS TERMINATED BY ''
OPTIONALLY ENCLOSED BY ''
LINES TERMINATED BY '
'
IGNORE 1 LINES
( product_code
, product_name
, product_category )
;

/* SCD2 ON PRODUCT NAME AND GROUP                                 */

UPDATE
  product_dim a
, product_stg b
SET
  expiry_date = SUBDATE(CURRENT_DATE, 1)
WHERE
    a.product_code = b.product_code
AND (   a.product_name <> b.product_name
     OR a.product_category <> b.product_category )
     AND expiry_date = '9999-12-31'
;

INSERT INTO product_dim
SELECT
  NULL
, b.product_code
, b.product_name
, b.product_category
, CURRENT_DATE
, '9999-12-31'
FROM
  product_dim a
, product_stg b
WHERE
    a.product_code = b.product_code
AND (   a.product_name <> b.product_name
     OR a.product_category <> b.product_category )
AND EXISTS (
SELECT *
FROM product_dim x
WHERE     b.product_code = x.product_code
      AND a.expiry_date = SUBDATE(CURRENT_DATE, 1))
AND NOT EXISTS (
SELECT *
FROM product_dim y
WHERE    b.product_code = y.product_code
     AND y.expiry_date = '9999-12-31')
;

/* END OF SCD 2                                                   */

/* ADD NEW PRODUCT                                                */

INSERT INTO product_dim
SELECT
  NULL
, product_code
, product_name
, product_category
, CURRENT_DATE
, '9999-12-31'
FROM product_stg
WHERE product_code NOT IN(
SELECT y.product_code
FROM product_dim x, product_stg y
WHERE x.product_code = y.product_code )
;

/* END OF PRODUCT_DIM POPULATION                                  */

/* PRODUCT_COUNT_FACT POPULATION                                  */

TRUNCATE product_count_fact
;

INSERT INTO product_count_fact
(product_sk, product_launch_date_sk)
SELECT
  a.product_sk
, b.date_sk

FROM
  product_dim a
, date_dim b

WHERE
a.effective_date = b.date
GROUP BY product_code
HAVING COUNT(product_code) = 1
;

/* for products that have been updated by SCD2                    */

INSERT INTO product_count_fact (product_sk, product_launch_date_sk)
SELECT
  a.product_sk
, b.date_sk

FROM
  product_dim a
, date_dim b

WHERE
a.effective_date = b.date
GROUP BY product_code
HAVING COUNT(product_code) > 1
;

/* END OF PRODUCT_COUNT_FACT POPULATION                           */

/* INSERT NEW ORDERS                                              */

INSERT INTO sales_order_fact
SELECT
  b.customer_sk
, c.product_sk
, f.sales_order_attribute_sk
, d.order_date_sk
, NULL
, NULL
, NULL
, NULL
, g.entry_date_sk
, a.order_number
, e.request_delivery_date_sk
, order_amount
, quantity
, NULL
, NULL
, NULL
, NULL
FROM
  source.sales_order a
, customer_dim b
, product_dim c
, order_date_dim d
, request_delivery_date_dim e
, sales_order_attribute_dim f
, entry_date_dim g
WHERE
      order_status = 'N'
AND a.entry_date = CURRENT_DATE
AND a.customer_number = b.customer_number
AND a.status_date >= b.effective_date
AND a.status_date <= b.expiry_date
AND a.product_code = c.product_code
AND a.status_date >= c.effective_date
AND a.status_date <= c.expiry_date
AND a.status_date = d.order_date
AND a.entry_date = g.entry_date
AND a.request_delivery_date = e.request_delivery_date
AND a.verification_ind = f.verification_ind
AND a.credit_check_flag = f.credit_check_flag
AND a.new_customer_ind = f.new_customer_ind
AND a.web_order_flag = f.web_order_flag
AND a.status_date >= f.effective_date
AND a.status_date <= f.expiry_date
;

UPDATE
  sales_order_fact a
, source.sales_order b
, allocate_date_dim c
SET
  a.allocate_date_sk = c.allocate_date_sk
, a.allocate_quantity = b.quantity
WHERE
    order_status = 'A'
AND b.entry_date = CURRENT_DATE
AND b.order_number = a.order_number
AND c.allocate_date = b.status_date
;

UPDATE
  sales_order_fact a
, source.sales_order b
, packing_date_dim d
SET
  a.packing_date_sk = d.packing_date_sk
, a.packing_quantity = b.quantity
WHERE
    order_status = 'P'
AND b.entry_date = CURRENT_DATE
AND b.order_number = a.order_number
AND d.packing_date = b.status_date
;

UPDATE
  sales_order_fact a
, source.sales_order b
, ship_date_dim e
SET
  a.ship_date_sk = e.ship_date_sk
, a.ship_quantity = b.quantity
WHERE
    order_status = 'S'
AND b.entry_date = CURRENT_DATE
AND b.order_number = a.order_number
AND e.ship_date = b.status_date
;

UPDATE
  sales_order_fact a
, source.sales_order b
, receive_date_dim f
SET
  a.receive_date_sk = f.receive_date_sk
, a.receive_quantity = b.quantity
WHERE
    order_status = 'R'
AND b.entry_date = CURRENT_DATE
AND b.order_number = a.order_number
AND f.receive_date = b.status_date
;

/* end of script                                                  */

You must also revise the month end population script. The revised script, presented in listing 22.3, has three parts. The first part handles sales orders that are not late. The second part adds incoming sales amounts to the existing rows that have the same product and month as the incoming sales orders. The third part adds new rows for the incoming sales orders.

Listing 22.3. Revised Month End Sales Order population
/******************************************************************/
/*                                                                */
/* month_end_sales_order_22.sql                                   */
/*                                                                */
/******************************************************************/

USE dw;

/* normal (order_date = entry_date)                               */

INSERT INTO month_end_sales_order_fact
SELECT
  d.month_sk
, a.product_sk
, SUM(order_amount)
, SUM(order_quantity)
FROM
  sales_order_fact a
, order_date_dim b
, entry_date_dim c
, month_dim d
WHERE
    a.order_date_sk = b.order_date_sk
AND a.entry_date_sk = c.entry_date_sk
AND b.order_date = c.entry_date
AND c.month = MONTH(CURRENT_DATE)
AND c.year = YEAR(CURRENT_DATE)
AND b.month = d.month
AND b.year = d.year
GROUP BY d.month, d.year, product_sk
;

/* late arrival, amount & quantity already exist for the past months
*/

UPDATE
   month_end_sales_order_fact a

, (SELECT
  y.month
, y.year
, w.product_sk
, SUM(order_amount) order_amount
, SUM(order_quantity) order_quantity
FROM
  sales_order_fact x
, order_date_dim y
, entry_date_dim z
, product_dim w
WHERE
    x.order_date_sk = y.order_date_sk
AND x.entry_date_sk = z.entry_date_sk
AND order_date <> entry_date
AND MONTH(entry_date) = MONTH(CURRENT_DATE)
AND YEAR(entry_date) = YEAR(CURRENT_DATE)
AND x.product_sk = w.product_sk
GROUP BY y.month, y.year, product_sk) b

, month_dim c

SET month_order_amount = month_order_amount + b.order_amount
, month_order_quantity = month_order_quantity + b.order_quantity
WHERE
    a.month_order_sk = c.month_sk
AND b.month = c.month
AND b.year = c.year
AND a.product_sk = b.product_sk
;

/* late arrival but amount & quantity not exist for the past months
*/

INSERT INTO month_end_sales_order_fact
SELECT
  d.month_sk
, a.product_sk
, SUM(order_amount)
, SUM(order_quantity)
FROM
  sales_order_fact a
, order_date_dim b
, entry_date_dim c
, month_dim d
WHERE
    a.order_date_sk = b.order_date_sk
AND a.entry_date_sk = c.entry_date_sk
AND b.order_date <> c.entry_date
AND c.month = MONTH(CURRENT_DATE)
AND c.year = YEAR(CURRENT_DATE)
AND b.month = d.month
AND b.year = d.year
AND NOT EXISTS
(SELECT * FROM
  month_end_sales_order_fact p
, sales_order_fact q
, month_dim s
WHERE
    p.month_order_sk = s.month_sk
AND s.month = d.month
AND s.year = d.year
AND p.product_sk = a.product_sk )
GROUP BY
  d.month
, d.year
, a.product_sk
;

/* end of script                                                  */

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.158.21