Suppose that you have a list of tasks that have to be repeated while or until a condition is true (or false). If you know about programming languages, think of this as an analogy of a while
or repeat until
loop. Kettle allows you to implement these kinds of iterations and this recipe explains how to do it.
For the recipe, you will use one of the transformations described in the introduction of this chapter: the transformation that generates random numbers and writes them to a file. You will execute the transformation repeatedly and keep track of the number of lines written to those files. You will continue executing the transformation as long as the total number of written lines is less than 25.
You will need the transformation that generates random numbers described in the introduction. If instead of downloading the transformation you created it yourself, then you will have to do a quick fix in order to make Kettle save the number of written lines to the log (this has already been done in the transformation available on the book's site):
Carry out the following steps:
total_lines
, for Value type 0
, and for Variable scope type select Valid in the current job.var total_lines = parseInt(parent_job.getVariable("total_lines")) var new_total_lines = total_lines + previous_result.getNrLinesOutput(); parent_job.setVariable("total_lines", new_total_lines); new_total_lines < 25;
lines written=${total_lines}
. For Log message type Ready to run again
. ${total_lines} lines have been written
. For Log message, type The generation of random numbers has succeeded
.2011/01/11 22:43:50 - Spoon - Starting job... 2011/01/11 22:43:50 - main - Start of job execution 2011/01/11 22:43:50 - lines written=10 - Ready to run again ... 2011/01/11 22:43:50 - lines written=20 - Ready to run again ... 2011/01/11 22:43:51 - 30 lines have been written. - The generation of random numbers has been successful. 2011/01/11 22:43:51 - main - Job execution finished 2011/01/11 22:43:51 - Spoon - Job has ended.
In order to run the transformation that generates a file with random numbers until the number of written lines is greater than 25, you implemented a loop. The following flow chart shows you the logic of this process:
To control the execution of the transformation, you created a variable named total_lines
and initialized this variable with the value 0
. After executing the transformation, you incremented the value of the variable using JavaScript code. If the value was less than 25
, you wrote a message to the log and re-ran the transformation. If not, you wrote a final message to the log.
The JavaScript code deserves a separate explanation:
previous_result.getNrLinesOutput()
is the function that returns the number of lines that were written by the previous job entry. That is the value that you have to add to the total_lines
variable, in order to keep the variable updated.
The couple of functions parent_job.getVariable()
and parent_job.setVariable()
are meant to get and set the value of the Kettle variable named total_lines
. By default, the type of the Kettle variables is String. Therefore, in order to do the math, you had to use the parseInt()
JavaScript function.
The second line in the JavaScript code evaluates to True
or False
. If it evaluates to True
, the JavaScript entry follows the green arrow. If it evaluates to False
, the JavaScript entry follows the red arrow.
In this recipe, you built a loop and controlled the execution with the help of a JavaScript entry. The following subsections give you more information about these topics.
Suppose that you need to build a job in which one or more entries have to be executed repeatedly until a condition is met. People refer to these repetitive tasks as a loop. The first thing you have to have in mind is a clear understanding of the logic of this loop; that is, the condition that will cause the exit from the loop: You may want to exit the loop when:
Once you understand this, you have to implement the logic. In the recipe, you implemented the logic with a JavaScript job entry. There are other entries that you can use for deciding whether to exit a loop or not. You will find useful entries for this purpose in the Conditions category: Simple evaluation, Evaluate file metrics, Check webservice availability, and so on. You can even implement the logic with an extra transformation that will succeed or fail according to your rules.
Make sure that the number of iterations is small. If you build an endless loop or a loop with many iterations, then you risk running out of heap space.
Here are a few tips to avoid errors:
If your loop is causing you troubles—for instance, you run out of memory—try to rethink the solution. The following are some alternatives you can think of:
N
. N
.The JavaScript entry is a useful step for controlling whether a job entry or a group of job entries should be executed or not. In particular, you used it in the recipe for deciding if the loop should end or not.
This entry works as follows: In its setting window, you should type JavaScript code that ends with an expression that evaluates to a Boolean. As with any job entry, the Javascript entry either succeeds or fails. Success or failure is decided by the result of evaluating that expression. Then, based on that value, Kettle knows which entry to execute next.
Within the code, you are free to use the previous_result
element. The previous_result
element is the representation of the Result object, an object that contains the result of the execution of the previous job entry. In the recipe, you used the previous_result
element to ask for the number of written lines, but you can ask for the number of read lines, the number of errors, the number of executed job entries, and so on. You can find a complete description of the available previous_result
options at the following URL: http://wiki.pentaho.com/display/EAI/Evaluating+conditions+in+The+JavaScript+job+entry
3.138.120.136