To load data massively into your Jigsaw database, you must have the Jigsaw database created and the MySQL server running. You already know how to do this. If not, please refer to Chapter 1 for the installation of MySQL and Chapter 8 for the creation of the Jigsaw database.
This tutorial will overwrite all your data in the js
database. If you don't want to overwrite the data in your js
database, you could simply create a new database with a different name and run the js.sql
script to create the tables in your new database.
After checking that everything is in order, follow these instructions:
js_data.sql
script file.USE js;
If you created a new database, replace the name js
by the name of your new database.
SELECT
statements such as:SELECT * FROM cities
Having populated the database, let's prepare the Spoon environment:
kettle.properties
file located in the PDI home directory. Add the following variables: DB_HOST, DB_NAME, DB_USER, DB_PASS
, and DB_PORT
. As values put the setting for your connection to the Jigsaw database. Use the following lines as a guide:DB_HOST=localhost DB_NAME=js DB_USER=root DB_PASS=1234 DB_PORT=3306
DW_HOST, DW_NAME, DW_USER, DW_PASS
, and DW_PORT
. As values, put the setting for your connection to the js_dw
database—the database you created in Chapter 8 to load the time dimension. Here are some sample lines for you to use:DW_HOST=localhost DW_NAME=js_dw DW_USER=root DW_PASS=1234 DW_PORT=3306
Save the file.
shared.xml
. Copy it to your PDI home directory (the same directory where the kettle.properties
file is) overwriting the existing file. kettle.properties
file. js
database connection and select Edit. In the Settings frame, instead of fixed values, you will see variables: ${DS_HOST} for Host Name, ${DS_NAME} for Database Name, and so on. js_dw
shared connection: Right-click the database connection and select Edit. In the Settings frame, you will see the variables you defined in the kettle.properties
file—${DW_HOST}
, ${DW_NAME}
, and so on. dw_js
connection.In this tutorial you prepared the environment for working in the rest of the chapter.
You did two different things:
js
database tables and loaded data into them. js
and js_dw
.Note that the names for the connection don't have to match the names of the databases. This can benefit you in the following way: If you created a database with a different name for the Jigsaw database puzzle, your connection may still be named js
, and all code you download from the Packt website should work without touching anything but the kettle.properties
file.
You edited the kettle.properties
file by adding variables with the database connection values such as host name, database name, and so on. Then you edited the database connections. There you saw that the database settings didn't have values but variable names—the variables you had defined in the kettle.properties
file. For shared connections, PDI takes the database definition from the shared.xml
file.
The information in this section allows you to understand the organization of the data in the Jigsaw database. In the first place, you have a DER. A DER or entity relationship diagram is a graphical representation that allows you to see how the tables in a database are related to each other. The following is the DER for the js
database:
The following table contains a brief explanation of what each table is for:
Table name |
Content |
---|---|
|
Information about manufacturers of the products. |
|
It is about the products you sell such as puzzles and accessories. The table has descriptive information and data about prices and stock. The |
|
It contains information about the list of methods for buying—for example, in store, by telephone, and so on. |
|
Information about list of methods of payment such as cash, check, credit card, and so on. |
|
The list of countries. |
|
The list of cities. |
|
A list of customers. A customer has a number, a name, and an address. |
|
The header of invoices including date, customer number, and total amount. The invoices dates range from 2004 to 2010. |
3.149.249.211