Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix C. Quick Reference: Steps and Job Entries

This appendix summarizes the purpose of the steps and job entries used in the tutorials throughout the book. For each of them, you can see the name of the Time for action section where it was introduced and also a reference to the chapters where you can find more examples that use it.

Tip

How to use this reference

Suppose you are inside Spoon, editing a Transformation. If the transformation uses a step that you don't know and you want to understand what it does or how to use it, double-click the step and take note of the title of the settings window; that title is the name of the step. Then search for that name in the transformation steps reference table. The steps are listed in alphabetical order so that you can find them quickly. The last column will take you to the place in the book where the step is explained.

The same applies to jobs. If you see in a job an unknown entry, double-click the entry and take note of the title of the settings window; that title is the name of the entry. Then search for that name in the job entries reference table. The job entries are also listed in alphabetical order.

Transformation steps

The following table includes all the transformation steps used in the book. For a full list of steps and their descriptions, select Help | Show step plug-in information in Spoon's main menu.

You can also visit http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+v3.2.+Steps for a full step reference along with some examples.

Name	Purpose	Time for action
Abort	Aborts a transformation	Aborting when there are too many errors (Chapter 7); also in Chapters 11 and 12
Add constants	Adds one or more constant fields to the stream	Gathering progress and merging all together (Chapter 4); also in Chapters 7, 8, and 9
Add sequence	Gets the next value from a sequence	Assigning tasks by Distributing (Chapter 4); also in Chapters 6 and 11
Append streams	Appends two streams in an ordered way	Giving priority to Bouchard by using Append Stream (Chapter 4)
Calculator	Creates new fields by performing simple calculations	Reviewing examination by using the Calculator step (Chapter 3); also in Chapters 6 and 8
Combination lookup/update	Updates a junk dimension. Alternatively, it can be used to update Type I SCD.	Loading a region dimension with a Combination lookup/update step (Chapter 9); also in Chapter 12
Copy rows to result	Write rows to the executing job. The information will then be passed to the next entry in the job.	Splitting the generation of top scores by copying and getting rows (Chapter 11)
Data Validator	Validates fields based on a set of rules	Checking films file with the Data Validator (Chapter 7)
Database join	Executes a database query using stream values as parameters	Using a Database join step to create a list of suggested products to buy (Chapter 9)
Database lookup	Looks up values in a database table	Using a Database lookup step to create a list of products to buy (Chapter 9), also in Chapter 12
Delay row	For each incoming row, waits a given time before giving the row to the next step	Generating custom files by executing a transformation for every input row (Chapter 11)
Delete	Delete data in a database table	Deleting data about discontinued items (Chapter 8)
Dimension lookup/update	Updates or looks up a Type II SCD. Alternatively, it can be used to update Type I SCD or hybrid dimensions.	Keeping a history of product changes with the Dimension lookup/update step (Chapter 9), also in Chapter 12
Dummy (do nothing)	This step type doesn't do anything! However it is used often.	Creating a hello world transformation (Chapter 1), also in Chapters 2, 3, 7, and 9
Excel Input	Reads data from a Microsoft Excel (`.xls`) file	Browsing PDI new features by copying a dataset (Chapter 4); also in Chapter 8
Excel Output	Writes data to a Microsoft Excel (`.xls`) file	Getting data from an XML file with information about countries (Chapter 2); also in Chapters 4 and10
Filter rows	Splits the stream in two upon a given condition. Alternatively, it is used to let pass just the rows that meet the condition.	Counting frequent words by filtering (Chapter 3); also in Chapters 4, 6, 7, 9, 11, and 12
Fixed file input	Reads data from a fixed width file	Calculating Scores with JavaScript (Chapter 5)
Formula	Creates new fields by using formulas. It uses Pentaho's libformula.	Reviewing examination by using the Formula step (Chapter 3); also in Chapters 10 and 11
Generate Rows	Generates a number of equal rows	Creating a hello world transformation (Chapter 1); also in Chapters 6, 9, and 10
Get data from XML	Gets data from XML files	Getting data from an XML file with information about countries(Chapter 2); also in chapters 3 and 9
Get rows from result	Reads rows from a previous entry in a job	Splitting the generation of top scores by copying and getting rows (Chapter 11)
Get System Info	Gets information from the system like system date, arguments, etc.	Updating a file with news about examination (Chapter 2) also in Chapters 7, 8, 10, 11, and12
Get Variables	Takes the values of environment or Kettle variables and adds them as fields in the stream	Creating the time dimension dataset(Chapter 6)
Group by	Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly	Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 4, 7, and 9
If field value is null	If a field is null, it changes its value to a constant. It can be applied to all fields of a same data type, or to particular fields	Enhancing a films file by converting rows to columns (Chapter 6)
Insert / Update	Updates or inserts rows in a database table	Inserting new products or updating existent ones (Chapter 8)
Mapping (sub-transformation)	Runs a subtransformation	Calculating the top scores with a subtransformation (Chapter 11)
Mapping input specification	Specifies the input interface of a sub-transformation	Calculating the top scores with a subtransformation (Chapter 11)
Mapping output specification	Specifies the output interface of a sub-transformation	Calculating the top scores with a subtransformation (Chapter 11)
Modified Java Script Value	Allows you to code Javascript to modify or create new fields. It's also possible to code Java	Calculating Scores with JavaScript(Chapter 5); also in Chapters 6, 7, and 11
Number range	Creates ranges based on a numeric field	Capturing errors while calculating the age of a film (Chapter 7); also in Chapter 8
Regex Evaluation	Evaluates a field with a regular expression	Validating Genres with a Regex Evaluation step (Chapter 7); also in Chapter 12
Row denormaliser	Denormalises rows by looking up key-value pairs	Enhancing a films file by converting rows to columns (Chapter 6)
Row Normaliser	Normalises data de-normalised	Enhancing the matches file by normalizing the dataset (Chapter 6)
Select values	Selects, reorders, or removes fields. Also allows you to change the metadata of fields	Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 4, 6, 7, 8, 9, 11, and 12
Set Variables	Sets Kettle variables based on a single input row	Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11); also in Chapter 12
Sort rows	Sorts rows based upon field values, ascending or descending	Reviewing examinations by using the Calculator step (Chapter 3); also in Chapters 4, 6, 7, 8, 9, and 11
Split field to rows	Splits a single string field and creates a new row for each split term	Counting frequent words by filtering (Chapter 3)
Split Fields	Splits a single field into more than one	Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 6 and 11
Stream lookup	Looks up values coming from another stream in the transformation	Finding out which language people speak (Chapter 3); also in Chapter 6
Switch / Case	Switches a row to a certain target step based on the value of a field	Assigning tasks by filtering priorities with the Switch/ Case step (Chapter 4)
Table input	Reads data from a database table	Getting data about shipped orders (Chapter 8); also in Chapters 9, 10, and 12
Table output	Writes data to a database table	Loading a table with a list of manufacturers (Chapter 8), also in Chapters 9 and 12
Text file input	Reads data from a text file	Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 5, 6, 7, 8, and 11
Text file output	Writes data to a text file	Sending the results of matches to a plain file (Chapter 2); also in Chapters 3, 7, 9, 10, and 11
Update	Updates data in a database table	Loading a region dimension with a Combination lookup/update step (Chapter 9)
Value Mapper	Maps values of a certain field from one value to another	Browsing PDI new features by copying a dataset (Chapter 4)