Executing a PDI transformation as part of a Pentaho process

Everything in the Pentaho platform is made of action sequences. An action sequence is, as its name suggests, a sequence of atomic actions that together accomplish small processes. Those atomic actions cover a broad spectrum of tasks, for example, getting data from a table in a database, running a piece of JavaScript code, launching a report, sending e-mails, or running a Kettle transformation.

For this recipe, suppose that you want to run the sample transformation to get the current weather conditions for some cities. Instead of running this from the command line, you want to interact with this service from the PUC. You will do it with an action sequence.

Getting ready

In order to follow this recipe, you will need a basic understanding of action sequences and at least some experience with the Pentaho BI Server and Pentaho Design Studio, the action sequences editor.

Before proceeding, make sure that you have a Pentaho BI Server running. You will also need Pentaho Design Studio. You can download the latest version from the following URL:

http://sourceforge.net/projects/pentaho/files/Design%20Studio/

Finally, you will need the sample transformation weather.ktr.

How to do it...

This recipe is split into two parts: First, you will create the action sequence, and then you will test it from the PUC. So carry out the following steps:

  1. Launch Design Studio. If this is your first use, then create the solution project where you will save your work.
  2. Copy the sample transformation to the solution folder.
  3. Create a new action sequence and save it in your solution project with the name weather.xaction.
  4. Define two inputs that will be used as the parameters for your transformation: city_name and temperature_scale.
  5. Add two Prompt/Secure Filter actions and configure them to prompt for the name of the city and the temperature scale.
  6. Add a new process action by selecting Get Data From | Pentaho Data Integration.
  7. Now, you will fill in the Input Section of the process action configuration. Give the process action a name.
  8. For Transformation File, type solution:weather.ktr. For Transformation Step, type current_conditions_normalized and for Kettle Logging Level, type or select basic.
  9. In the Transformation Inputs, add the inputs city_name and temperature_scale.
  10. Select the XML source tab.
  11. Search for the<action-definition> tag that contains the following line:
    <component-name>KettleComponent</component-name>
    
  12. You will find something like this:
    <action-definition>
    <component-name>KettleComponent</component-name>
    <action-type>looking for the current weather</action-type>
    <action-inputs>
    <city_name type="string"/>
    <temperature_scale type="string"/>
    </action-inputs>
    <action-resources>
    <transformation-file type="resource"/>
    </action-resources>
    <action-outputs/>
    <component-definition>
    <monitor-step><![
    CDATA[current_conditions]]></monitor-step>
    <kettle-logging-level><![
    CDATA[basic]]></kettle-logging- level>
    </component-definition>
    </action-definition>
    
  13. Below<component-definition>, type the following:
    <set-parameter>
    <name>TEMP</name>
    <mapping>temperature_scale</mapping>
    </set-parameter>
    <set-argument>
    <name>1</name>
    <mapping>city_name</mapping>
    </set-argument>
    

    Note

    In fact, you can type this anywhere between<component-definition> and</component-definition>. The order of the internal tags is not important.

  14. Go back to the tab named 2. Define Process.
  15. Now, fill in the Output Section of the Process Data Integration process action. For Output Rows Name, type weather_result and for Output Rows Count Name, type number_of_rows.
  16. Below the Process Data Integration process action, add an If Statement. As the condition, type number_of_rows==0.
  17. Within the If Statement, add a Message Template process action.
  18. In the Text frame, type No results for the city {city_name}. For Output Name, type weather_result.
  19. Finally, in the Process Outputs section of the action sequence, add weather_result as the only output.
  20. Your final action sequence should look like the one shown in the following screenshot:
    How to do it...
  21. Save the file.

Now, it is time to test the action sequence that you just created.

  1. Login to the PUC and refresh the repository, so that the weather.xaction that you just created shows up.
  2. Browse the solution folders and look for the xaction and double-click on it.
  3. Provide a name of a city and change the temperature scale, if you wish.
  4. Click on Run; you will see something similar to the following:
    How to do it...
  5. You can take a look at the Pentaho console to see the log of the transformation running behind the scenes.
  6. Run the action sequence again. This time, type the name of a fictional city, for example, my_invented_city. This time, you will see the following message
    Action Successful
    weather_result=No results for the city my_invented_city
    

How it works...

You can run Kettle transformations as part of an action sequence by using the Pentaho Data Integration process action located within the Get Data From category of process actions.

The main task of a PDI process action is to run a Kettle transformation. In order to do that, it has a list of checks and textboxes where you specify everything you need to run the transformation and everything you want to receive back after having run it.

The most important setting in the PDI process action is the name and location of the transformation to be executed. In this example, you had a .ktr file in the same location as the action sequence, so you simply typed solution: followed by the name of the file.

Then, in the Transformation Step textbox, you specified the name of the step in the transformation that would give you the results you needed. The PDI process action (just as any regular process action) is capable of receiving input from the action sequence and returning data to be used later in the sequence. Therefore, in the drop-down list in the Transformation Step textbox, you could see the list of available action sequence inputs. In this case, you just typed the name of the step.

Note

If you are not familiar with action sequences, note that the drop-down list in the Transformation Step textbox is not the list of available steps. It is the list of available action sequence inputs.

You have the option of specifying the Kettle log level. In this case, you selected Basic. This was the level of log that Kettle wrote to the Pentaho console. Note that in this case, you also have the option of selecting an action sequence input instead of one of the log levels in the list.

As said earlier, the process action can use any inputs from the action sequence. In this case, you used two inputs: city_name and temperature_scale. Then you passed them to the transformation in the XML code:

  • By putting city_name between<set-parameter></set-parameter>, you passed the city_name input as the first command-line argument.
  • By putting temperature_scale between<set-argument></set-argument>, you passed the temperature_scale to the transformation as the value for the named parameter TEMP.

As mentioned, the process can return data to be used later in the sequence. The textboxes in the Output Section are meant to do that. Each textbox you fill in will be a new data field to be sent to the next process action. In this case, you defined two outputs: weather_result and number_of_rows. The first contains the dataset that comes out of the step you defined in Transformation Step; in this case, current_conditions_normalized. The second has the number of rows in that dataset.

You used those outputs in the next process action. If number_of_rows was equal to zero, then you would overwrite the weather_result data with a message to be displayed to the user.

Finally, you added the weather_result as the output of the action sequence, so that the user either sees the current conditions for the required city, or the custom message indicating that the city was not found.

There's more...

The following are some variants in the use of the Pentaho Data Integration process action:

Specifying the location of the transformation

When your transformation is in a file, you specify the location by typing or browsing for the name of the file. You have to provide the name relative to the solution folder. In the recipe, the transformation was in the same folder as the action sequence, so you simply typed solution: followed by the name of the transformation including the extension ktr.

If instead of having the transformation in a file it is located in a repository, then you should check the Use Kettle Repository option. The Transformation File textbox will be replaced with two textboxes named Directory and Transformation File. In these textboxes, you should type the name of the folder and the transformation exactly as they are in the repository. Alternatively, you can select the names from the available drop-down lists.

Note

In these drop-down lists, you will not see the available directories and transformations in the repository. The lists are populated with the available action sequence inputs. This also applies to specifying the location of a job in an action sequence.

Supplying values for named parameters, variables and arguments

If your transformation defines or needs named parameters, Kettle variables or command-line arguments, you can pass them from the action sequence by mapping KettleComponent inputs.

First of all, you need to include them in the Transformation Inputs section. This is equivalent to typing them inside the KettleComponent action-definition XML element.

Then, depending on the kind of data to pass, you have to define a different element:

Element in the transformation

Element in the action sequence

Command line parameter

<set-argument></set-argument>

Variable

<set-variable></set-variable>

Named parameter

<set-parameter></set-parameter>

In the recipe, you mapped one command line argument and one named parameter.

With the following lines, you mapped the input named temperature_scale with the named parameter TEMP:

<set-parameter>
<name>TEMP</name>
<mapping>temperature_scale</mapping>
</set-parameter>

In the case of a variable, the syntax is exactly the same.

In the case of arguments instead of a name, you have to provide the position of the parameter: 1, 2, and so on.

Note

Design Studio does not implement the capability of mapping inputs with variables or named parameters. Therefore, you have to type the mappings in the XML code. If you just want to pass command-line arguments, then you can skip this task because by default, it is assumed that the inputs you enter are command-line arguments.

This way of providing values for named parameters, variables, and command-line arguments also applies to jobs executed from an action sequence.

Keeping things simple when it's time to deliver a plain file

Reporting is a classic way of delivering data. In the PUC, you can publish not only Pentaho reports, but also third-party ones, for example, Jasper reports. However, what if the final user simply wants a plain file with some numbers in it? You can avoid the effort of creating it with a reporting tool. Just create a Kettle transformation that does it and call it from an action, in the same way you did in the recipe. This practical example is clearly explained by Nicholas Goodman in his blog post Self Service Data Export using Pentaho. The following is the link to that post, which also includes sample code for downloading:

http://www.nicholasgoodman.com/bt/blog/2009/02/09/self-service-data-export-using-pentaho/

See also

  • The recipe named Configuring the Pentaho BI Server for running PDI jobs and transformations in this chapter. It is recommended that you see this recipe before trying to run a transformation from the PUC.
  • The recipe named Executing a PDI job from the Pentaho User Console in this chapter. See this recipe if you want to run a job instead of a transformation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.186.83