Time for action—populating the Jigsaw database

To load data massively into your Jigsaw database, you must have the Jigsaw database created and the MySQL server running. You already know how to do this. If not, please refer to Chapter 1 for the installation of MySQL and Chapter 8 for the creation of the Jigsaw database.

Note

This tutorial will overwrite all your data in the js database. If you don't want to overwrite the data in your js database, you could simply create a new database with a different name and run the js.sql script to create the tables in your new database.

After checking that everything is in order, follow these instructions:

  1. From Packt's website download the js_data.sql script file.
  2. Launch the MySQL query browser.
  3. From the File menu select Open Script....
  4. Locate the downloaded file and open it.
  5. At the beginning of the script file you will see this line:
    USE js;
    

    If you created a new database, replace the name js by the name of your new database.

  6. Click on the Execute button.
  7. At the bottom of the screen, you'll see a progress message.
  8. When the script execution ends, verify that the database has been populated. Execute some SELECT statements such as:
    SELECT * FROM cities
    

Having populated the database, let's prepare the Spoon environment:

  1. Edit the kettle.properties file located in the PDI home directory. Add the following variables: DB_HOST, DB_NAME, DB_USER, DB_PASS, and DB_PORT. As values put the setting for your connection to the Jigsaw database. Use the following lines as a guide:
    DB_HOST=localhost
    DB_NAME=js
    DB_USER=root
    DB_PASS=1234
    DB_PORT=3306
    
  2. Add the following variables: DW_HOST, DW_NAME, DW_USER, DW_PASS, and DW_PORT. As values, put the setting for your connection to the js_dw database—the database you created in Chapter 8 to load the time dimension. Here are some sample lines for you to use:
    DW_HOST=localhost
    DW_NAME=js_dw
    DW_USER=root
    DW_PASS=1234
    DW_PORT=3306
    

    Save the file.

  3. Included in the downloaded material is a file named shared.xml. Copy it to your PDI home directory (the same directory where the kettle.properties file is) overwriting the existing file.

    Note

    Before overwriting the file, please take a backup, as this will delete any share connections you might have created.

  4. Launch Spoon. If it was running, restart it so that it recognizes the changes in the kettle.properties file.
  5. Create a new transformation.

    Tip

    If you don't see the shared database connections js and dw, please verify that you copied the shared.xml file to the right folder.

  6. Right-click the js database connection and select Edit. In the Settings frame, instead of fixed values, you will see variables: ${DS_HOST} for Host Name, ${DS_NAME} for Database Name, and so on.
  7. Test the connection.
  8. Repeat the steps for the js_dw shared connection: Right-click the database connection and select Edit. In the Settings frame, you will see the variables you defined in the kettle.properties file—${DW_HOST}, ${DW_NAME}, and so on.
  9. Test the dw_js connection.

    Note

    If any of the database tests fail, please check that the connection variables you put in the kettle.properties file are correct. Also check that MySQL is running database.

What just happened?

In this tutorial you prepared the environment for working in the rest of the chapter.

You did two different things:

  • First, you ran a script that emptied all the js database tables and loaded data into them.
  • Then, you redefined the database connections to the databases js and js_dw.

Note

Note that the names for the connection don't have to match the names of the databases. This can benefit you in the following way: If you created a database with a different name for the Jigsaw database puzzle, your connection may still be named js, and all code you download from the Packt website should work without touching anything but the kettle.properties file.

You edited the kettle.properties file by adding variables with the database connection values such as host name, database name, and so on. Then you edited the database connections. There you saw that the database settings didn't have values but variable names—the variables you had defined in the kettle.properties file. For shared connections, PDI takes the database definition from the shared.xml file.

Note

Note that you didn't save the transformation you created. That was intentional. The only purpose for creating it was to be able to see the shared connections.

Exploring the Jigsaw database model

The information in this section allows you to understand the organization of the data in the Jigsaw database. In the first place, you have a DER. A DER or entity relationship diagram is a graphical representation that allows you to see how the tables in a database are related to each other. The following is the DER for the js database:

Exploring the Jigsaw database model

The following table contains a brief explanation of what each table is for:

Table name

Content

manufacturers

Information about manufacturers of the products.

products

It is about the products you sell such as puzzles and accessories. The table has descriptive information and data about prices and stock. The pro_type column has the type of product—puzzle, glue, and so on. Several of the columns apply only to puzzles, such as shape or pieces.

buy_methods

It contains information about the list of methods for buying—for example, in store, by telephone, and so on.

payment_methods

Information about list of methods of payment such as cash, check, credit card, and so on.

countries

The list of countries.

cities

The list of cities.

customers

A list of customers. A customer has a number, a name, and an address.

invoices

The header of invoices including date, customer number, and total amount. The invoices dates range from 2004 to 2010.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.249.211