Appendix C. Quick Reference: Steps and Job Entries

This appendix summarizes the purpose of the steps and job entries used in the tutorials throughout the book. For each of them, you can see the name of the Time for action section where it was introduced and also a reference to the chapters where you can find more examples that use it.

Tip

How to use this reference

Suppose you are inside Spoon, editing a Transformation. If the transformation uses a step that you don't know and you want to understand what it does or how to use it, double-click the step and take note of the title of the settings window; that title is the name of the step. Then search for that name in the transformation steps reference table. The steps are listed in alphabetical order so that you can find them quickly. The last column will take you to the place in the book where the step is explained.

The same applies to jobs. If you see in a job an unknown entry, double-click the entry and take note of the title of the settings window; that title is the name of the entry. Then search for that name in the job entries reference table. The job entries are also listed in alphabetical order.

Transformation steps

The following table includes all the transformation steps used in the book. For a full list of steps and their descriptions, select Help | Show step plug-in information in Spoon's main menu.

You can also visit http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+v3.2.+Steps for a full step reference along with some examples.

Icon

Name

Purpose

Time for action

Transformation steps

Abort

Aborts a transformation

Aborting when there are too many errors (Chapter 7); also in Chapters 11 and 12

Transformation steps

Add constants

Adds one or more constant fields to the stream

Gathering progress and merging all together (Chapter 4); also in Chapters 7, 8, and 9

Transformation steps

Add sequence

Gets the next value from a sequence

Assigning tasks by Distributing (Chapter 4); also in Chapters 6 and 11

Transformation steps

Append streams

Appends two streams in an ordered way

Giving priority to Bouchard by using Append Stream (Chapter 4)

Transformation steps

Calculator

Creates new fields by performing simple calculations

Reviewing examination by using the Calculator step (Chapter 3); also in Chapters 6 and 8

Transformation steps

Combination lookup/update

Updates a junk dimension. Alternatively, it can be used to update Type I SCD.

Loading a region dimension with a Combination lookup/update step (Chapter 9); also in Chapter 12

Transformation steps

Copy rows to result

Write rows to the executing job. The information will then be passed to the next entry in the job.

Splitting the generation of top scores by copying and getting rows (Chapter 11)

Transformation steps

Data Validator

Validates fields based on a set of rules

Checking films file with the Data Validator (Chapter 7)

Transformation steps

Database join

Executes a database query using stream values as parameters

Using a Database join step to create a list of suggested products to buy (Chapter 9)

Transformation steps

Database lookup

Looks up values in a database table

Using a Database lookup step to create a list of products to buy (Chapter 9), also in Chapter 12

Transformation steps

Delay row

For each incoming row, waits a given time before giving the row to the next step

Generating custom files by executing a transformation for every input row (Chapter 11)

Transformation steps

Delete

Delete data in a database table

Deleting data about discontinued items (Chapter 8)

Transformation steps

Dimension lookup/update

Updates or looks up a Type II SCD. Alternatively, it can be used to update Type I SCD or hybrid dimensions.

Keeping a history of product changes with the Dimension lookup/update step (Chapter 9), also in Chapter 12

Transformation steps

Dummy (do nothing)

This step type doesn't do anything! However it is used often.

Creating a hello world transformation (Chapter 1), also in Chapters 2, 3, 7, and 9

Transformation steps

Excel Input

Reads data from a Microsoft Excel (.xls) file

Browsing PDI new features by copying a dataset (Chapter 4); also in Chapter 8

Transformation steps

Excel Output

Writes data to a Microsoft Excel (.xls) file

Getting data from an XML file with information about countries (Chapter 2); also in Chapters 4 and10

Transformation steps

Filter rows

Splits the stream in two upon a given condition. Alternatively, it is used to let pass just the rows that meet the condition.

Counting frequent words by filtering (Chapter 3); also in Chapters 4, 6, 7, 9, 11, and 12

Transformation steps

Fixed file input

Reads data from a fixed width file

Calculating Scores with JavaScript (Chapter 5)

Transformation steps

Formula

Creates new fields by using formulas. It uses Pentaho's libformula.

Reviewing examination by using the Formula step (Chapter 3); also in Chapters 10 and 11

Transformation steps

Generate Rows

Generates a number of equal rows

Creating a hello world transformation (Chapter 1); also in Chapters 6, 9, and 10

Transformation steps

Get data from XML

Gets data from XML files

Getting data from an XML file with information about countries(Chapter 2); also in chapters 3 and 9

Transformation steps

Get rows from result

Reads rows from a previous entry in a job

Splitting the generation of top scores by copying and getting rows (Chapter 11)

Transformation steps

Get System Info

Gets information from the system like system date, arguments, etc.

Updating a file with news about examination (Chapter 2) also in Chapters 7, 8, 10, 11, and12

Transformation steps

Get Variables

Takes the values of environment or Kettle variables and adds them as fields in the stream

Creating the time dimension dataset(Chapter 6)

Transformation steps

Group by

Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly

Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 4, 7, and 9

Transformation steps

If field value is null

If a field is null, it changes its value to a constant. It can be applied to all fields of a same data type, or to particular fields

Enhancing a films file by converting rows to columns (Chapter 6)

Transformation steps

Insert / Update

Updates or inserts rows in a database table

Inserting new products or updating existent ones (Chapter 8)

Transformation steps

Mapping (sub-transformation)

Runs a subtransformation

Calculating the top scores with a subtransformation (Chapter 11)

Transformation steps

Mapping input specification

Specifies the input interface of a sub-transformation

Calculating the top scores with a subtransformation (Chapter 11)

Transformation steps

Mapping output specification

Specifies the output interface of a sub-transformation

Calculating the top scores with a subtransformation (Chapter 11)

Transformation steps

Modified Java Script Value

Allows you to code Javascript to modify or create new fields. It's also possible to code Java

Calculating Scores with JavaScript(Chapter 5); also in Chapters 6, 7, and 11

Transformation steps

Number range

Creates ranges based on a numeric field

Capturing errors while calculating the age of a film (Chapter 7); also in Chapter 8

Transformation steps

Regex Evaluation

Evaluates a field with a regular expression

Validating Genres with a Regex Evaluation step (Chapter 7); also in Chapter 12

Transformation steps

Row denormaliser

Denormalises rows by looking up key-value pairs

Enhancing a films file by converting rows to columns (Chapter 6)

Transformation steps

Row Normaliser

Normalises data de-normalised

Enhancing the matches file by normalizing the dataset (Chapter 6)

Transformation steps

Select values

Selects, reorders, or removes fields. Also allows you to change the metadata of fields

Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 4, 6, 7, 8, 9, 11, and 12

Transformation steps

Set Variables

Sets Kettle variables based on a single input row

Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11); also in Chapter 12

Transformation steps

Sort rows

Sorts rows based upon field values, ascending or descending

Reviewing examinations by using the Calculator step (Chapter 3); also in Chapters 4, 6, 7, 8, 9, and 11

Transformation steps

Split field to rows

Splits a single string field and creates a new row for each split term

Counting frequent words by filtering (Chapter 3)

Transformation steps

Split Fields

Splits a single field into more than one

Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 6 and 11

Transformation steps

Stream lookup

Looks up values coming from another stream in the transformation

Finding out which language people speak (Chapter 3); also in Chapter 6

Transformation steps

Switch / Case

Switches a row to a certain target step based on the value of a field

Assigning tasks by filtering priorities with the Switch/ Case step (Chapter 4)

Transformation steps

Table input

Reads data from a database table

Getting data about shipped orders (Chapter 8); also in Chapters 9, 10, and 12

Transformation steps

Table output

Writes data to a database table

Loading a table with a list of manufacturers (Chapter 8), also in Chapters 9 and 12

Transformation steps

Text file input

Reads data from a text file

Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 5, 6, 7, 8, and 11

Transformation steps

Text file output

Writes data to a text file

Sending the results of matches to a plain file (Chapter 2); also in Chapters 3, 7, 9, 10, and 11

Transformation steps

Update

Updates data in a database table

Loading a region dimension with a Combination lookup/update step (Chapter 9)

Transformation steps

Value Mapper

Maps values of a certain field from one value to another

Browsing PDI new features by copying a dataset (Chapter 4)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.178