Looking for values in a database (with complex conditions or multiple tables involved)

In the previous recipe, you saw how to search for columns in a database table based on simple conditions. With Kettle, you can also search by providing complex conditions or involving more than one table. In this recipe, you will learn how to perform that kind of search by using the Database join step.

In order to let you compare the different options for searching data in a database with ease, we will work with the same example that you saw in the preceding recipe: the Steel Wheels sample data. You want to look for products that match a given search term and whose prices are below a given value.

Getting ready

In order to follow this recipe, you need the Steel Wheels database.

How to do it...

Carry out the following steps:

  1. Create a new transformation.
  2. Create a stream that generates a dataset like the one shown in the following screenshot:
    How to do it...

    Tip

    You can type the data into a file and read the file, or use a Data Grid.

  3. Add a Database join step. You will find it in the Lookup category of steps.
  4. Double-click on the step. As Connection, select (or create if it doesn't exist) the connection to the sampledata database.
  5. In the SQL frame, type the following statement:
    SELECT PRODUCTNAME
    , PRODUCTSCALE
    , BUYPRICE
    FROM PRODUCTS
    WHERE PRODUCTNAME LIKE concat(‹%›,?,›%›)
    AND BUYPRICE < ?
    
  6. Check the Outer join? option.
  7. Click on Get Fields to fill the grid with two parameters: prod and max_price.
  8. Close the Database join configuration window and do a preview on this step. You will see the following screen:
How to do it...

How it works...

The Database join step is a powerful step for looking for data in a database based on given conditions. The conditions usually involve comparisons between columns in tables and fields in your stream; therefore it's called a join.

Note

Note that this is not really a database join. Instead of joining tables in a database, you are joining the result of a database query with a Kettle dataset.

The question marks you type in the SQL statement (those in the highlighted lines in the recipe) represent parameters. The purpose of these parameters is to be replaced with the fields you provide in the lower grid. For each row in your stream the Database join step replaces the parameters in the same order as they are in the grid, and executes the SQL statement.

If we take as an example the first row in the dataset, the SQL statement after the replacement of the parameters would look like this:

SELECT PRODUCTNAME
, PRODUCTSCALE
, BUYPRICE
FROM PRODUCTS
WHERE PRODUCTNAME LIKE concat('%','Aston Martin','%')
AND BUYPRICE < 90

And that's exactly the search you wanted to do for that row: look for the records where the column BUYPRICE was less than 90 and the column PRODUCTNAME contained Aston Martin.

As the result of the database join you can retrieve any number of columns. Each database column that you type in the SELECT clause will become a new field in your dataset. In the recipe, those fields were PRODUCTNAME, PRODUCTSCALE, and BUYPRICE. In particular, if you had typed SELECT *, you would have retrieved all columns in the tables involved in the statement.

In the recipe, you checked the Outer join? option. The effect of this is as follows: For the rows where the lookup fails, the new fields are retrieved with null values. That was the case for the second row. There were no products with Ford Falcon as part of its description and price lower than 70. Therefore, that row shows empty values for the PRODUCTNAME, PRODUCTSCALE, and BUYPRICE fields.

Note that in the recipe, you filled the grid with two fields. That is exactly the same number of question marks in the statement.

Note

The number of fields in the grid must be exactly the same as the number of question marks in the query.

Also, note that in the grid the prod field was in the first place and the max_price in the second place. If you look at the highlighted lines in the recipe, you will see that the statement expected the parameters in exactly that order.

Note

The replacement of the markers respects the order of the fields in the grid.

So far, the results are quite similar to those you got with a database lookup. There is a significant difference however. For the third row, the Corvette product, you can see two results. This means that the Database join found two matching rows in the database, and retrieved them both. This is not possible with a Database lookup step.

There's more...

The Database join step can be a little complicated to use or to understand compared to the Database lookup step. While the Database lookup step has a UI that makes the configuration of the step easy, in the Database join step, you have to write an SQL statement. That implies that you need a good knowledge of SQL. However, the Database join step has several advantages over the Database lookup one:

  • It allows you to look up from a combination of tables.
  • It allows you to retrieve fields from more than one table at a time.
  • It allows you to retrieve aggregate results, fragments of a field (for example, a substring of a field), or a combination of fields (for example, two strings concatenated).
  • It allows you to retrieve more than one row from the database for each incoming row in the Kettle dataset. This is by far the most important advantage! By default, all matching rows are retrieved. If you want to limit the number of rows to retrieve for each row in your stream, just change the Number of rows to return value.

See also

  • The recipe named Looking for values in a database table in this chapter. This recipe shows the simplest way of looking for values in a database.
  • The recipe named Looking for values in a database with extreme flexibility in this chapter. This recipe shows the most flexible way of searching for data in a database.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.4.174