Suppose that you have a part of a transformation that you will like to use in another transformation. A quick way to do that would be to copy the set of steps and paste them into the other transformation, and then perform some modifications, for example, changing the names of the fields accordingly.
Now you realize that you need it in a third place. You do that again: copy, paste, and modify.
What if you notice that there was a bug in that part of the transformation? Or maybe you'd like to optimize something there? You would need to do that in three different places! This inconvenience is one of the reasons why you might like to move those steps to a common place - a subtransformation.
In this recipe, you will develop a subtransformation that receives the following two dates:
The subtransformation will calculate how old a person was (or will be) at the reference date if the date of birth provided was theirs.
For example, if the date of birth is December 30, 1979 and the reference date is December 19, 2010 the age would be calculated as 30 years.
Then, you will call that subtransformation from a main transformation.
You will need a file containing a list of names and dates of birth, for example:
name,birthdate Paul,31/12/1969 Santiago,15/02/2004 Lourdes,05/08/1994 Anna,08/10/1978
This recipe is split into two parts.
First, you will create the subtransformation by carrying out the following steps:
output
. birth field
. For Type, select Date. Name the step birthdates
. reference_field
. For Type, select Date. Name the step reference date
.The following two steps perform the main task - the calculation of the age.
Note that these steps are a slightly modified version of the steps you used for calculating the age in the previous recipe.
calculated_age
. As Value type, select Integer. For Java expression type:((b_month > t_month) || (b_month - t_month ==0 && b_day > t_day))? (t_year b_year - 1):(t_year - b_year)
The expression is written over three lines for clarity. You should type the whole expression on a single line.
Now you will create the main transformation. It will read the sample file and calculate the age of the people in the file as at the present day.
people
. today
. For Type, select Today 00:00:00. Name the step today
. people
and the subtransformation step birthdates
. people
, the name of the step that reads the file. birthdates
, the name of the subtransformation step that expects the dates of birth. birthdate
, the name of the field coming out the people step containing the date of birth. Under Fieldname to mapping input step, type birth_field
, the name of the field in the subtransformation step birthdates that will contain the date of birth needed for calculating the age. today
and the subtransformation step reference date
. Fill in the tab as follows: calculated_age
. Under Fieldname to target step, type age
.The subtransformation (the first transformation you created) has the purpose of calculating the age of a person at a given reference date. In order to do that, it defines two entry points through the use of the Mapping input specification steps. These steps are meant to specify the fields needed by the subtransformation. In this case, you defined the date of birth in one entry point and the reference date in the other. Then it calculates the age in the same way you would do it with any regular transformation. Finally it defines an output point through the Mapping output specification step.
Note that we developed the subtransformation blindly, without testing or previewing. This was because you cannot preview a subtransformation. The Mapping input specification steps are just a definition of the data that will be provided; they have no data to preview.
While you are designing a subtransformation, you can provisionally substitute each Mapping input specification step with a step that provides some fictional data, for example, a Text file input, a Generate rows, a Get System Info, or a Data Grid step.
This fictional data for each of these steps has to have the same metadata as the corresponding Mapping input specification step. This will allow you to preview and test your subtransformation before calling it from another transformation.
Now, let's explain the main transformation, the one that calls the subtransformation. You added as many input tabs as entry points to the subtransformation. The input tabs are meant to map the steps and fields in your transformation to the corresponding steps and fields in the subtransformation. For example, the field that you called today
in your main transformation became reference_field
in the subtransformation.
On the other side, in the subtransformation, you defined just one output point. Therefore, under the Output tab, you clicked on Is this the main data path? Checking it means that you don't need to specify the correspondence between steps. What you did under this tab was fill in the grid to ask the field calculated_age
be renamed to age
.
In the final preview, you can see all the fields you had before the subtransformation, plus the fields added by it. Among these fields, there is the age
field which was the main field you expected to be added.
As you can see in the final dataset, the field birthdates
kept its name, while the field today
was renamed to reference_field
. The field birthdates
kept its name because you checked the Ask these values to be renamed back on output? option under the people input tab. On the other hand, the field today
was renamed because you didn't check that option under the today input tab.
Kettle subtransformations are a practical way to centralize some functionality so that it may be used in more than one place. Another use of subtransformations is to isolate a part of a transformation that meets some specific purpose as a whole, in order to keep the main transformation simple, no matter whether you will reuse that part or not.
Let's look at some examples of what you might like to implement via a subtransformation:
If you then wish to implement any of the following enhancements, you will need to do it in one place:
From the development point of view, a subtransformation is just a regular transformation with some input and output steps connecting it to the transformations that use it.
Back in Chapter 6, Understanding Data Flows, it was explained that when a transformation is launched, each step starts a new thread; that is, all steps work simultaneously. The fact that we are using a sub transformation does not change that. When you run a transformation that calls a subtransformation, both the steps in the transformation and those in the subtransformation start at the same time, and run in parallel. The subtransformation is not an isolated process; the data in the main transformation just flows through the subtransformation. Imagine this flow as if the steps in the subtransformation were part of the main transformation. In this sense, it is worth noting that a common cause of error in the development of subtransformations is the wrong use of the Select values step.
Selecting some values with a Select values step by using the Select & Alter tab in a subtransformation will implicitly remove not only the rest of the fields in the subtransformation, but also all of the fields in the transformation that calls it.
If you need to rename or reorder some fields in a subtransformation, then make sure you check the Include unspecified fields, ordered by name option in order to keep not only the rest of the fields in the subtransformation but also the fields coming from the calling transformation.
If what you need is to remove some fields, do not use the Select & Alter tab; use the Remove tab instead. If needed, use another Select values step to reorder or rename the fields afterward.
3.15.235.188