To consolidate dimensions, you need to change the data warehouse schema. The revised schema is shown in Figure 23.1. A new zip_code_dim table has been added, and the structures of the sales_order_fact and production_fact tables have been changed. Note that only the tables related to the zip code are shown.
The zip_code_dim table is related to the two fact tables. The relationships replace those from the customer and factory dimensions. You need two relationships to the sales_order_fact table, one for the customer address and one for the shipping address. There is only one relationship to the production_fact table, therefore only the factory zip code surrogate key was added in this fact table.
The script in Listing 23.1 can be used to create the zip_code_dim table.
/******************************************************************/ /* */ /* zip_code_dim.sql */ /* */ /******************************************************************/ /* default to dw */ USE dw; CREATE TABLE zip_code_dim ( zip_code_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY , zip_code INT(5) , city CHAR(30) , state CHAR(2) , effective_date DATE , expiry_date DATE ) ; /* end of script */ |
Run the script in Listing 23.1 to create the zip_code_dim table:
mysql> . c:mysqlscriptszip_code_dim.sql
Then, you need to populate the zip_code_dim table. I have provided the necessary zip code information in the zip_code.csv file below.
ZIP CODE,CITY,STATE 17050,PITTSBURGH,PA 17051,MC VEYTOWN,PA 17052,MAPLETON DEPOT,PA 17053,MARYSVILLE,PA 17054,MATTAWANA,PA 17055,MECHANICSBURG,PA 44102,CLEVELAND,OH
As usual, you load data from a CSV file to a staging table. The script in Listing 23.2 can be used to create a staging table called zip_code_stg.
/******************************************************************/ /* */ /* zip_code_stg.sql */ /* */ /******************************************************************/ /* default to dw */ USE dw; CREATE TABLE zip_code_stg ( zip_code INT(5) , city CHAR(30) , state CHAR(2) ) ; /* end of script */ |
Run the script in Listing 23.2 to create the staging table.
mysql> . c:mysqlscriptszip_code_stg.sql
Assuming the zip_code.csv file contains the complete postal codes and these codes never change, you need to load the zip code file to the data warehouse once only. This is a type of pre-population called one-time population.
The script in Listing 23.3 contains the script for loading the zip code data into the zip_code_stg staging table.
/******************************************************************/ /* */ /* zip_code_population.sql */ /* */ /******************************************************************/ /* default to dw */ USE dw; TRUNCATE zip_code_stg; LOAD DATA INFILE 'zip_code.csv' INTO TABLE zip_code_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY ' ' IGNORE 1 LINES ( zip_code , city , state ) ; INSERT INTO zip_code_dim SELECT NULL , zip_code , city , state , '0000-00-00' , '9999-12-31' FROM zip_code_stg ; /* end of script */ |
Run the script in Listing 23.3 to populate the zip_code_dim table.
mysql> . c:mysqlscriptszip_code_population.sql
Here is the message that you will see as the response to the query.
Database changed Query OK, 1 row affected (0.09 sec) Query OK, 7 rows affected (0.08 sec) Records: 7 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 7 rows affected (0.07 sec) Records: 7 Duplicates: 0 Warnings: 0
You can query the zip_code_dim table to confirm correct population using this statement.
mysql> select zip_code_sk sk, zip_code zip, city, state from zip_code_dim;
The query result is as follows.
+----+-------+----------------+-------+ | sk | zip | city | state | +----+-------+----------------+-------+ | 1 | 17050 | PITTSBURGH | PA | | 2 | 17051 | MC VEYTOWN | PA | | 3 | 17052 | MAPLETON DEPOT | PA | | 4 | 17053 | MARYSVILLE | PA | | 5 | 17054 | MATTAWANA | PA | | 6 | 17055 | MECHANICSBURG | PA | | 7 | 44102 | CLEVELAND | OH | +----+-------+----------------+-------+ 7 rows in set (0.41 sec)
Now you need to revise the sales_order_fact table and the other tables. Here are the five steps you need to do to update the database structure.
1. | |
2. | Add the customer_zip_code_sk and shipping_zip_code_sk columns to the sales_order_fact table. |
3. | Initially load the two zip code surrogate key columns based on the existing customer and shipping zip codes. |
4. | Remove the customer and shipping codes as well as their cities and states from the customer_dim table |
5. | Remove the customer city, state, and zip code from the pa_customer_dim table |
The script in Listing 23.4 implements the above five revisions.
/******************************************************************/ /* */ /* sales_order_fact_23.sql */ /* */ /******************************************************************/ USE dw; CREATE VIEW customer_zip_code_dim ( customer_zip_code_sk , customer_zip_code , customer_city , customer_state , effective_date , expiry_date ) AS SELECT zip_code_sk , zip_code , city , state , effective_date , expiry_date FROM zip_code_dim ; CREATE VIEW shipping_zip_code_dim ( shipping_zip_code_sk , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date ) AS SELECT zip_code_sk , zip_code , city , state , effective_date , expiry_date FROM zip_code_dim ; ALTER TABLE sales_order_fact ADD customer_zip_code_sk INT AFTER customer_sk , ADD shipping_zip_code_sk INT AFTER customer_zip_code_sk ; UPDATE sales_order_fact a, customer_dim b, customer_zip_code_dim c SET a.customer_zip_code_sk = c.customer_zip_code_sk WHERE a.customer_sk = b.customer_sk AND b.customer_zip_code = c.customer_zip_code ; UPDATE sales_order_fact a , customer_dim b , shipping_zip_code_dim c SET a.shipping_zip_code_sk = c.shipping_zip_code_sk WHERE a.customer_sk = b.customer_sk AND b.shipping_zip_code = c.shipping_zip_code ; ALTER TABLE customer_dim DROP customer_zip_code , DROP customer_city , DROP customer_state , DROP shipping_zip_code , DROP shipping_city , DROP shipping_state ; ALTER TABLE pa_customer_dim DROP customer_zip_code , DROP customer_city , DROP customer_state , DROP shipping_zip_code , DROP shipping_city , DROP shipping_state ; /* end of script */ |
3.16.66.156