Time for action reading all your files at a time using a single Text file input step and regular expressions

You could do the same thing you did above by using a different notation. Follow these instructions:

  1. Open the transformation and edit the configuration windows of the input step.
  2. Delete the lines with the names of the files.
  3. In the first row of the grid, type C:pdi_filesinput under the File/Directory column, and group[1-4].txt under the Wildcard (Reg.Exp.) column.
  4. Click the Show filename(s)... button. You'll see the list of files that match the expression.
    Time for action reading all your files at a time using a single Text file input step and regular expressionsdata, readingmultiple files, reading at once
  5. Close the tiny window and click Preview rows to confirm that the rows shown belong to the four files that match the expression you typed.

What just happened?

In this particular case, all filenames follow a pattern—group1.txt, group2.txt, and so on. In order to specify the names of the files, you used a regular expression. In the column File/Directory you put the static part of the names, while in the Wildcard (Reg.Exp.) column you put the regular expression with the pattern that a file must follow to be considered: the text group followed by a number between 1 and 4, and then .txt. Then, all files that matched the expression were considered as input files.

Regular expressions

There are many places inside Kettle where you may or have to provide a regular expression. A regular expression is much more than specifying the known wildcards ? and *.

Here you have some examples of regular expressions you may use to specify filenames:

The following regular expression ...

Matches ...

Examples

.*.txt

Any txt file

thisisaValidExample.txt

test(19|20)dd-(0[1-9]|1[012]).txt

Any txt file beginning with test followed by a date using the format yyyy-mm

test2009-12.txt

test2009-01.txt

(?i)test.+.txt

Any txt file beginning with test, upper or lower case

TeSTcaseinsensitive.tXt

Tip

Please note that the * wildcard doesn't work the same as it does on the command line. If you want to match any character, the * has to be preceded by a dot.

Here are some useful links in case you want to know more about regular expressions:

Troubleshooting reading files

Despite the simplicity of reading files with PDI, obstacles and errors appear. Many times the solution is simple but difficult to find if you are new to PDI. Here you have a list of common problems and possible solutions for you to take into account while reading and previewing a file:

Problem

Diagnostic

Possible solutions

You get the message Sorry, no rows found to be previewed.

This happens when the input file doesn't exist or is empty.

It also may happen if you specified the input files with regular expressions and there is no file that matches the expression.

Check the name of the input files. Verify the syntax used, check that you didn't put spaces or any strange character as part of the name.

If you used regular expressions, check the syntax.

Also verify that you put the filename in the grid. If you just put it in the File or directory textbox, Kettle will not read it.

When you preview the data you see a grid with blank lines

The file contains empty lines, or you forgot to get the fields.

Check the content of the file.

Also check that you got the fields in the Fields tab.

You see the whole line under the first defined field.

You didn't set the proper separator and Kettle couldn't split the different fields.

Check and fix the separator in the Content tab.

You see strange characters.

You left the default content but your file has a different format or encoding.

Check and fix the Format and Encoding in the Content tab.

If you are not sure of the format, you can specify mixed.

You don't see all the lines you have in the file

You are previewing just a sample (100 lines by default).

Or you put a limit to the number of rows to get.

Another problem may be that you set the wrong number of header or footer lines.

When you preview, you see just a sample. This is not a problem.

If you raise the previewed number of rows and still have few lines, check the Header, Footer and Limit options in the Content tab.

Instead of rows of data, you get a window headed ERROR with an extract of the log

Different errors may happen, but the most common has to do with problems in the definition of the fields.

You could try to understand the log and fix the definition accordingly. For example if you see:

Couldn't parse field [Integer] with value [Italy].

The error is that PDI found the text Italy in a field that you defined as Integer.

If you made a mistake, you could fix it. On the other hand, if the file has errors, you could read all fields as String and you will not get the error again. In chapter 7 you will learn how to overcome these situations.

Grids

Grids are tables used in many Spoon places to enter or display information. You already saw grids in several configuration windows—Text file input, Text file output, and Select values.

Many grids contain field information. Examples of these grids are the Field tab window in the Text Input and Output steps, or the main configuration window of the Select Values step. In these cases, the grids are usually accompanied by a Get Fields button. The Get Fields button is a facility to avoid typing. When you press that button, Kettle fills the grid with all the available fields.

For example, when reading a file, the Get Fields button fills the grid with the columns of the incoming file. When using a Select Values step or a File output step, the Get Fields button fills the grid with all the fields entering from a previous step.

Tip

Every time you see a Get Fields button, consider it as a shortcut to avoid typing. Kettle will bring the fields available to the grid; you will only have to check the information brought and make minimal changes.

There are many places in Spoon where the grid serves also to edit other kinds of information. One example of that is the grid where you specify the list of files in a Text File Input step. No matter what kind of grid you are editing, there is always a contextual menu, which you may access by right-clicking on a row. That menu offers editing options to copy, paste, or move rows of the grid.

Tip

When the number of rows in the grid is big, use shortcuts! Most of the editing options of a grid have shortcuts that make the editing work easier and quicker.

You'll find a full list of shortcuts for editing grids in Appendix E.

Have a go hero—explore your own files

Try to read your own text files from Kettle. You must have several files with different kinds of data, different separators, and with or without header or footer. You can also search for files over the Internet; there are plenty of files there to download and play with. After configuring the input step, do a preview. If the data is not shown properly, fix the configuration and preview again until you are sure that the data is read as expected. If you have trouble reading the files, please refer to the Troubleshooting reading files section seen earlier for diagnosis and possible ways to solve the problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.229.161