This appendix summarizes the purpose of the steps and job entries used in the tutorials throughout the book. For each of them, you can see the name of the Time for action section where it was introduced and also a reference to the chapters where you can find more examples that use it.
How to use this reference
Suppose you are inside Spoon, editing a Transformation. If the transformation uses a step that you don't know and you want to understand what it does or how to use it, double-click the step and take note of the title of the settings window; that title is the name of the step. Then search for that name in the transformation steps reference table. The steps are listed in alphabetical order so that you can find them quickly. The last column will take you to the place in the book where the step is explained.
The same applies to jobs. If you see in a job an unknown entry, double-click the entry and take note of the title of the settings window; that title is the name of the entry. Then search for that name in the job entries reference table. The job entries are also listed in alphabetical order.
The following table includes all the transformation steps used in the book. For a full list of steps and their descriptions, select Help | Show step plug-in information in Spoon's main menu.
You can also visit http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+v3.2.+Steps for a full step reference along with some examples.
Icon |
Name |
Purpose |
Time for action |
---|---|---|---|
|
Abort |
Aborts a transformation |
Aborting when there are too many errors (Chapter 7); also in Chapters 11 and 12 |
|
Add constants |
Adds one or more constant fields to the stream |
Gathering progress and merging all together (Chapter 4); also in Chapters 7, 8, and 9 |
|
Add sequence |
Gets the next value from a sequence |
Assigning tasks by Distributing (Chapter 4); also in Chapters 6 and 11 |
|
Append streams |
Appends two streams in an ordered way |
Giving priority to Bouchard by using Append Stream (Chapter 4) |
|
Calculator |
Creates new fields by performing simple calculations |
Reviewing examination by using the Calculator step (Chapter 3); also in Chapters 6 and 8 |
|
Combination lookup/update |
Updates a junk dimension. Alternatively, it can be used to update Type I SCD. |
Loading a region dimension with a Combination lookup/update step (Chapter 9); also in Chapter 12 |
|
Copy rows to result |
Write rows to the executing job. The information will then be passed to the next entry in the job. |
Splitting the generation of top scores by copying and getting rows (Chapter 11) |
|
Data Validator |
Validates fields based on a set of rules |
Checking films file with the Data Validator (Chapter 7) |
|
Database join |
Executes a database query using stream values as parameters |
Using a Database join step to create a list of suggested products to buy (Chapter 9) |
|
Database lookup |
Looks up values in a database table |
Using a Database lookup step to create a list of products to buy (Chapter 9), also in Chapter 12 |
|
Delay row |
For each incoming row, waits a given time before giving the row to the next step |
Generating custom files by executing a transformation for every input row (Chapter 11) |
|
Delete |
Delete data in a database table |
Deleting data about discontinued items (Chapter 8) |
|
Dimension lookup/update |
Updates or looks up a Type II SCD. Alternatively, it can be used to update Type I SCD or hybrid dimensions. |
Keeping a history of product changes with the Dimension lookup/update step (Chapter 9), also in Chapter 12 |
|
Dummy (do nothing) |
This step type doesn't do anything! However it is used often. |
Creating a hello world transformation (Chapter 1), also in Chapters 2, 3, 7, and 9 |
|
Excel Input |
Reads data from a Microsoft Excel ( |
Browsing PDI new features by copying a dataset (Chapter 4); also in Chapter 8 |
|
Excel Output |
Writes data to a Microsoft Excel ( |
Getting data from an XML file with information about countries (Chapter 2); also in Chapters 4 and10 |
|
Filter rows |
Splits the stream in two upon a given condition. Alternatively, it is used to let pass just the rows that meet the condition. |
Counting frequent words by filtering (Chapter 3); also in Chapters 4, 6, 7, 9, 11, and 12 |
|
Fixed file input |
Reads data from a fixed width file |
Calculating Scores with JavaScript (Chapter 5) |
|
Formula |
Creates new fields by using formulas. It uses Pentaho's libformula. |
Reviewing examination by using the Formula step (Chapter 3); also in Chapters 10 and 11 |
|
Generate Rows |
Generates a number of equal rows |
Creating a hello world transformation (Chapter 1); also in Chapters 6, 9, and 10 |
|
Get data from XML |
Gets data from XML files |
Getting data from an XML file with information about countries(Chapter 2); also in chapters 3 and 9 |
|
Get rows from result |
Reads rows from a previous entry in a job |
Splitting the generation of top scores by copying and getting rows (Chapter 11) |
|
Get System Info |
Gets information from the system like system date, arguments, etc. |
Updating a file with news about examination (Chapter 2) also in Chapters 7, 8, 10, 11, and12 |
|
Get Variables |
Takes the values of environment or Kettle variables and adds them as fields in the stream |
Creating the time dimension dataset(Chapter 6) |
|
Group by |
Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly |
Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 4, 7, and 9 |
|
If field value is null |
If a field is null, it changes its value to a constant. It can be applied to all fields of a same data type, or to particular fields |
Enhancing a films file by converting rows to columns (Chapter 6) |
|
Insert / Update |
Updates or inserts rows in a database table |
Inserting new products or updating existent ones (Chapter 8) |
|
Mapping (sub-transformation) |
Runs a subtransformation |
Calculating the top scores with a subtransformation (Chapter 11) |
|
Mapping input specification |
Specifies the input interface of a sub-transformation |
Calculating the top scores with a subtransformation (Chapter 11) |
|
Mapping output specification |
Specifies the output interface of a sub-transformation |
Calculating the top scores with a subtransformation (Chapter 11) |
|
Modified Java Script Value |
Allows you to code Javascript to modify or create new fields. It's also possible to code Java |
Calculating Scores with JavaScript(Chapter 5); also in Chapters 6, 7, and 11 |
|
Number range |
Creates ranges based on a numeric field |
Capturing errors while calculating the age of a film (Chapter 7); also in Chapter 8 |
|
Regex Evaluation |
Evaluates a field with a regular expression |
Validating Genres with a Regex Evaluation step (Chapter 7); also in Chapter 12 |
|
Row denormaliser |
Denormalises rows by looking up key-value pairs |
Enhancing a films file by converting rows to columns (Chapter 6) |
|
Row Normaliser |
Normalises data de-normalised |
Enhancing the matches file by normalizing the dataset (Chapter 6) |
|
Select values |
Selects, reorders, or removes fields. Also allows you to change the metadata of fields |
Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 4, 6, 7, 8, 9, 11, and 12 |
|
Set Variables |
Sets Kettle variables based on a single input row |
Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11); also in Chapter 12 |
|
Sort rows |
Sorts rows based upon field values, ascending or descending |
Reviewing examinations by using the Calculator step (Chapter 3); also in Chapters 4, 6, 7, 8, 9, and 11 |
|
Split field to rows |
Splits a single string field and creates a new row for each split term |
Counting frequent words by filtering (Chapter 3) |
|
Split Fields |
Splits a single field into more than one |
Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 6 and 11 |
|
Stream lookup |
Looks up values coming from another stream in the transformation |
Finding out which language people speak (Chapter 3); also in Chapter 6 |
|
Switch / Case |
Switches a row to a certain target step based on the value of a field |
Assigning tasks by filtering priorities with the Switch/ Case step (Chapter 4) |
|
Table input |
Reads data from a database table |
Getting data about shipped orders (Chapter 8); also in Chapters 9, 10, and 12 |
|
Table output |
Writes data to a database table |
Loading a table with a list of manufacturers (Chapter 8), also in Chapters 9 and 12 |
|
Text file input |
Reads data from a text file |
Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 5, 6, 7, 8, and 11 |
|
Text file output |
Writes data to a text file |
Sending the results of matches to a plain file (Chapter 2); also in Chapters 3, 7, 9, 10, and 11 |
|
Update |
Updates data in a database table |
Loading a region dimension with a Combination lookup/update step (Chapter 9) |
|
Value Mapper |
Maps values of a certain field from one value to another |
Browsing PDI new features by copying a dataset (Chapter 4) |
18.225.255.178