Time for action using a Database join step to create a list of suggested products to buy

If your customers ordered a product that is out of stock and you don't want to let them down, you will suggest them some alternative puzzles to buy.

  1. Open the transformation of the previous tutorial and save it under a new name.
  2. Delete the Text file output step.
  3. Double-click the Group by step and add an aggregated field named customers with the list of customers separated by (,). Under Subject, select idcus and as Type, select Concatenate strings separated by ,.
  4. Double-click the Database lookup step. In the Values to return from the lookup table grid, add pro_theme as value in the String field.
  5. Add a Select values step. Use it to select the fields customers, quantity, pro_theme, and pro_name. Also rename quantity as quantity_param and pro_theme as theme_param. From the Lookup category, drag a Database join step to the canvas. Create a hop from the Select values step to this step.
  6. Double-click the Database join step.
  7. Select js as Connection.
  8. In the SQL frame type the following statement:
    SELECT man_code
    , pro_code
    , pro_name
    FROM products
    WHERE pro_theme like ?
    AND pro_stock>=?
    
  9. In the Number of rows to return textbox, type 4.
  10. Fill the grid as shown:
    Time for action using a Database join step to create a list of suggested products to buycomplex lookups, dataperforming
  11. Click on OK. The transformation looks like this:
    Time for action using a Database join step to create a list of suggested products to buycomplex lookups, dataperforming
  12. With the last step selected, do a Preview.
  13. You should see this:
    Time for action using a Database join step to create a list of suggested products to buycomplex lookups, dataperforming
  14. In the Step Metrics you should see this:
    Time for action using a Database join step to create a list of suggested products to buycomplex lookups, dataperforming

What just happened?

You took the list of orders and filtered those for which you ran out of products. For the customers that ordered those products you built a list of four alternative puzzles to buy.

The selection of the puzzles was based on the theme. To filter the suggested puzzles, you used the theme of the ordered product.

The second parameter in the Database join step, the ordered quantity, was used to offer only alternatives for products for which there is a sufficient stock.

Joining data from the database to the stream data by using a Database join step

With the Database join step, you can combine your incoming stream with data from your database, based on given conditions. The conditions are put as parameters in the query you write in the Database join step.

Tip

Note that this is not really a database join as the name suggests; it is a join of data from the database to the stream data.

In the tutorial you used two parameters—the theme and the quantity ordered. With those parameters, you queried the list of products with the same theme:

where pro_theme like ?

and for which you have stock:

and pro_stock>=?

You set the parameters as question marks. This works like the question marks in a Table input step you learned in the last chapter—the parameters are replaced positionally. The difference is that here you define the list and the order of the parameters. You do it in the small grid at the bottom of the settings window. This means you aren't forced to use all the incoming fields as parameters, and that you also may change the order.

Just as you do in a Table input step, instead of using positional parameters, you can use Kettle variables by using the ${} notation and checking the Replace variables checkbox.

Note

You don't need to add the Select values step to discard fields and rename the parameters. You did it just to have fewer fields in the final screenshot so that it was easier to understand the output of the Database join step.

The step will give you back the manufacturer code, the product code, and the product name for the matching records.

Tip

As you cannot do a preview here, you can write and try your query inside a Table input step or in MySQL Query Browser. When you are done, just copy and paste the query here.

So far, you did the same you could have done with Database lookup step—looking for a record with a given condition, and adding new fields to the stream. However, there is a big difference here—you put 4 as the Number of rows to return. This means for each incoming row, the step will give you back up to four results. The following shows you this:

Joining data from the database to the stream data by using a Database join step

Note

Note that if you had left the Number of rows to return empty, the step would have returned all found rows.

You may need to use a Database join step in several situations:

  • When, as the result of the lookup, there is more than one row for each incoming row. This was the case in the tutorial.
  • When you have to look in a combination of tables. Look at the following SQL statement:
    SELECT co.country_name
    FROM customers cu
    , cities ci
    , countries co
    WHERE cu.city_id = ci.city_id
    AND ci.cou_id = co.cou_id
    AND cu.cus_id = 1000
    

    This statement returns the name of the country where the customer with id 1000 lives. If you want to look up the countries where a list of customers live, you can do it with a sentence like this by using a Database join step.

  • When you want to look for an aggregate result. Look at this sample query:
    SELECT pro_theme
    , count(*) quant
    FROM products
    GROUP BY pro_theme
    ORDER BY pro_theme
    

    This statement returns the number of puzzles by theme. If you have a list of themes and you want to find out how many puzzles you have for each theme, you can use a query like this also by using a Database join step.

The last option in the list can also be developed without using the Database join step. You could execute the SELECT statement with a Table Input step, and then look for the calculated quantity by using a Stream lookup step.

Note

As you can see, this is another situation where PDI offers more that one way to do the same thing. Sometimes it is a matter of taste. In general, you should test each option and choose the method which gives you the best performance.

Have a go hero—rebuilding the list of customers

Redo the Hero exercise preparing the delivery of the products, this time using a Database join step. Try to discover which one is preferable from the point of view of performance. If you don't see any difference, try with a bigger number of records in the main stream. You will have to create your own dataset for this test.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.255