Revising the Data Warehouse Schema

To consolidate dimensions, you need to change the data warehouse schema. The revised schema is shown in Figure 23.1. A new zip_code_dim table has been added, and the structures of the sales_order_fact and production_fact tables have been changed. Note that only the tables related to the zip code are shown.

Figure 23.1. The schema after zip_code_dim is added


The zip_code_dim table is related to the two fact tables. The relationships replace those from the customer and factory dimensions. You need two relationships to the sales_order_fact table, one for the customer address and one for the shipping address. There is only one relationship to the production_fact table, therefore only the factory zip code surrogate key was added in this fact table.

The script in Listing 23.1 can be used to create the zip_code_dim table.

Listing 23.1. Creating the zip_code_dim table
/******************************************************************/
/*                                                                */
/* zip_code_dim.sql                                               */
/*                                                                */
/******************************************************************/

/* default to dw                                                  */

USE dw;

CREATE TABLE zip_code_dim (
  zip_code_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, zip_code INT(5)
, city CHAR(30)
, state CHAR(2)
, effective_date DATE
, expiry_date DATE
)
;

/* end of script                                                  */

Run the script in Listing 23.1 to create the zip_code_dim table:

mysql> . c:mysqlscriptszip_code_dim.sql

Then, you need to populate the zip_code_dim table. I have provided the necessary zip code information in the zip_code.csv file below.

ZIP CODE,CITY,STATE
17050,PITTSBURGH,PA
17051,MC VEYTOWN,PA
17052,MAPLETON DEPOT,PA
17053,MARYSVILLE,PA
17054,MATTAWANA,PA
17055,MECHANICSBURG,PA
44102,CLEVELAND,OH

As usual, you load data from a CSV file to a staging table. The script in Listing 23.2 can be used to create a staging table called zip_code_stg.

Listing 23.2. Creating the zip_code_stg table
/******************************************************************/
/*                                                                */
/* zip_code_stg.sql                                               */
/*                                                                */
/******************************************************************/
/* default to dw                                                  */

USE dw;

CREATE TABLE zip_code_stg (
  zip_code INT(5)
, city CHAR(30)
, state CHAR(2)
)
;

/* end of script                                                  */

Run the script in Listing 23.2 to create the staging table.

mysql> . c:mysqlscriptszip_code_stg.sql

Assuming the zip_code.csv file contains the complete postal codes and these codes never change, you need to load the zip code file to the data warehouse once only. This is a type of pre-population called one-time population.

The script in Listing 23.3 contains the script for loading the zip code data into the zip_code_stg staging table.

Listing 23.3. Zip_code_dim population
/******************************************************************/
/*                                                                */
/* zip_code_population.sql                                        */
/*                                                                */
/******************************************************************/

/* default to dw                                                  */

USE dw;

TRUNCATE zip_code_stg;

LOAD DATA INFILE 'zip_code.csv'
INTO TABLE zip_code_stg
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ''
LINES TERMINATED BY '
'
IGNORE 1 LINES
( zip_code
, city
, state      )
;

INSERT INTO zip_code_dim
SELECT
  NULL
, zip_code
, city
, state
, '0000-00-00'
, '9999-12-31'
FROM zip_code_stg
;

/* end of script                                                  */

Run the script in Listing 23.3 to populate the zip_code_dim table.

mysql> . c:mysqlscriptszip_code_population.sql

Here is the message that you will see as the response to the query.

Database changed
Query OK, 1 row affected (0.09 sec)

Query OK, 7 rows affected (0.08 sec)
Records: 7  Deleted: 0  Skipped: 0  Warnings: 0

Query OK, 7 rows affected (0.07 sec)
Records: 7  Duplicates: 0  Warnings: 0

You can query the zip_code_dim table to confirm correct population using this statement.

mysql> select zip_code_sk sk, zip_code zip, city, state from
       zip_code_dim;

The query result is as follows.

+----+-------+----------------+-------+
| sk | zip   | city           | state |
+----+-------+----------------+-------+
|  1 | 17050 | PITTSBURGH     | PA    |
|  2 | 17051 | MC VEYTOWN     | PA    |
|  3 | 17052 | MAPLETON DEPOT | PA    |
|  4 | 17053 | MARYSVILLE     | PA    |
|  5 | 17054 | MATTAWANA      | PA    |
|  6 | 17055 | MECHANICSBURG  | PA    |
|  7 | 44102 | CLEVELAND      | OH    |
+----+-------+----------------+-------+
7 rows in set (0.41 sec)

Now you need to revise the sales_order_fact table and the other tables. Here are the five steps you need to do to update the database structure.

1.
Create a customer_zip_code_dim view and a shipping_zip_code_dim based on the zip_code_dim table.

2.
Add the customer_zip_code_sk and shipping_zip_code_sk columns to the sales_order_fact table.

3.
Initially load the two zip code surrogate key columns based on the existing customer and shipping zip codes.

4.
Remove the customer and shipping codes as well as their cities and states from the customer_dim table

5.
Remove the customer city, state, and zip code from the pa_customer_dim table

The script in Listing 23.4 implements the above five revisions.

Listing 23.4. The revised sales_order_fact script
/******************************************************************/
/*                                                                */
/* sales_order_fact_23.sql                                        */
/*                                                                */
/******************************************************************/

USE dw;

CREATE VIEW customer_zip_code_dim
( customer_zip_code_sk
, customer_zip_code
, customer_city
, customer_state
, effective_date
, expiry_date )
AS SELECT
  zip_code_sk
, zip_code
, city
, state
, effective_date
, expiry_date
FROM zip_code_dim
;

CREATE VIEW shipping_zip_code_dim
( shipping_zip_code_sk
, shipping_zip_code
, shipping_city
, shipping_state
, effective_date
, expiry_date )
AS SELECT
  zip_code_sk
, zip_code
, city
, state
, effective_date
, expiry_date
FROM zip_code_dim
;

ALTER TABLE sales_order_fact
  ADD customer_zip_code_sk INT AFTER customer_sk
, ADD shipping_zip_code_sk INT AFTER customer_zip_code_sk
;

UPDATE sales_order_fact a, customer_dim b, customer_zip_code_dim c
SET
  a.customer_zip_code_sk = c.customer_zip_code_sk
WHERE
    a.customer_sk = b.customer_sk
AND b.customer_zip_code = c.customer_zip_code
;

UPDATE
  sales_order_fact a
, customer_dim b
, shipping_zip_code_dim c
SET
  a.shipping_zip_code_sk = c.shipping_zip_code_sk
WHERE
    a.customer_sk = b.customer_sk
AND b.shipping_zip_code = c.shipping_zip_code
;

ALTER TABLE customer_dim
  DROP customer_zip_code
, DROP customer_city
, DROP customer_state
, DROP shipping_zip_code
, DROP shipping_city
, DROP shipping_state
;

ALTER TABLE pa_customer_dim
  DROP customer_zip_code
, DROP customer_city
, DROP customer_state
, DROP shipping_zip_code
, DROP shipping_city
, DROP shipping_state
;

/* end of script                                                  */

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.66.156