Time for action—updating a file with news about examinations by setting a variable with the name of the file

The transformation in the Time for action from Chapter 2 that we just talked about reads a file provided by a professor, simply by taking the name of the file from the command line, and appends the file to the global one. Let's enhance that work.

  1. Copy the examination files you used in Chapter 2 to the input files and folder defined in your kettle.properties file. If you don't have them, download them from the Packt website.
  2. Open Spoon and create a new transformation.
  3. Use a Get System Info step to get the first command-line argument. Name the field as filename.
  4. Add a Filter rows step and create a hop from the Get System Info step to this step.
  5. From the Flow category drag an Abort step to the canvas, and from the Job category of steps drag a Set Variables step.
  6. From the Filter rows step, create two hops—one to the Abort step and the other to the Set Variables step. Double-click the Abort step. As Abort message, put File name is mandatory.
  7. Double-click the Set Variables step and click on Get Fields. The window will be filled as shown here:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file
  8. Click on OK.
  9. Double-click the Filter rows step. Add the following filter: filename IS NOT NULL. In the drop-down list to the right of Send 'true' data to step, select the Set Variables step, whereas in the drop-down list to the right of Send 'false' data to step, select the Abort step.
  10. The final transformation looks like this:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file
  11. Save the transformation in the transformations folder under the name getting_filename.ktr.
  12. Open the transformation named examinations.ktr that was created in Chapter 2 or download it from the Packt website. Save it in the transformations folder under the name examinations_2.ktr.
  13. Delete the Get System Info step.
  14. Double-click the Text file input step.
  15. In the Accept filenames from previous steps frame, uncheck the Accept filenames from previous step option.
  16. Under File/Directory in the Selected files grid, type ${FILENAME}. Save the transformation.
  17. Create a new job.
  18. From the General category, drag a START entry and a Transformation entry to the canvas and link them.
  19. Save the job as examinations.kjb.
  20. Double-click the Transformation entry. As Transformation filename, put the name of the first transformation that you created: ${Internal.Job.Filename.Directory}/transformations/getting_filename.ktr.
  21. Click on OK.

    Note

    Remember that you can avoid typing that long variable name by clicking Ctrl+Space and selecting the variable from the list.

  22. From the Conditions category, drag a File Exists entry to the canvas and create a hop from the Transformation entry to this new one.
  23. Double-click the File Exists entry.
  24. Write ${FILENAME} in the File name textbox and click on OK.
  25. Add a new Transformation entry and create a hop from the File Exists entry to this one.
  26. Double-click the entry and, as Transformation filename, put the name of the second transformation you created:${Internal.Job.Filename.Directory}/transformations/examinations_2.ktr.
  27. Add a Write To Log entry, and create a hop from the File Exists entry to this. The hop should be red, to indicate when execution fails. If not, right-click the hop and change the evaluation condition to Follow when result is false.
  28. Double-click the entry and fill all the textboxes as shown:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file
  29. Add two entries—an abort and a success. Create hops to these new entries as shown next:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file
  30. Save the job.
  31. Press F9 to run the job.
  32. Set the logging level to Minimal logging and click on Launch.
  33. The job fails. The following is what you should see in the Logging tab in the Execution results window:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file
  34. Press F9 again. This time set Basic logging as the logging level.
  35. In the arguments grid, write the name of a fictitious file—for example, c:/pdi_files/input/nofile.txt.
  36. Click on Launch. This is what you see now in the Logging tab window:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file
  37. Press F9 for the third time. Now provide a real examination filename such as c:/pdi_files/input/exam1.txt.
  38. Click on Launch. This time you see no errors. The examination file is appended to the global file:
    Time for action—updating a file with news about examinations by setting a variable with the name of the file

What just happened?

You enhanced the transformation you created in Chapter 3 for appending an examination file to a global examination file. This time you embedded the transformation in a job. The first transformation checks that the argument is not null. In that case, it sets a variable with the name provided. The main job verifies that the file exists. If everything is all right, then the second transformation performs the main task—it appends the given file to the global file.

Note that you changed the logging levels just according to what you needed to see—the highlighted lines in the earlier explanation.

Note

You may choose any logging level you want depending on the details of information you want to see.

Setting variables inside a transformation

So far, you had defined variables only in the kettle.properties file or inside Spoon while you were designing a transformation. In this last exercise, you learned to define your own variables at run time. You set a variable with the name of the file provided as a command-line argument. You used that variable in the main job to check if the file existed. Then you used the variable again in the main transformation. There you used it as the name of the file to read.

This example showed you the how to set a variable with the value of a command-line argument. This is not always the case. The value you set in a variable can be originated in different ways—it can be a value coming from a table in a database, a value defined with a Generate rows step, a value calculated with a Formula or a Calculator step, and so on.

The variables you define with a Set variables step can be used in the same way and the same places where you use any Kettle variable. Just take precautions to avoid using these variables in the same transformation where you have set them.

Note

The variables defined in a transformation are not available for using until you leave that transformation.

Have a go hero—enhancing the examination tutorial even more

Modify the job in the tutorial to avoid processing the same file twice. If the file is successfully appended to the global file, rename the original file by changing the extension to processed—for example, after processing the exam1.txt file rename it to exam1.processed.

After verifying if the file exists, also check whether the .processed version exists. If it exists, put a proper message in the log and abort. If someone accidently tries to process a file that is already processed, it will be ignored.

Tip

Besides the variable with the filename, create a variable with the name for the processed file. To build this name, simply manipulate the given name with some PDI steps.

Have a go hero—enhancing the jigsaw database update process

In the Time for action - inserting new products or updating existent ones section in Chapter 8, you read a file with a list of products belonging to the manufacturer Classic DeLuxe. The list was expected as a named parameter. Enhance that process. Create a job that first validates the existence of the provided file. If the file doesn't exist, put the proper error message in the log. If it exists, process the list. Then move the processed file to a folder named processed.

Tip

You don't need to create a transformation to set a variable with the name of the file. As it is expected as a named parameter, it is already available as a variable.

Have a go hero—executing the proper jigsaw database update process

In the hero exercise in Chapter 8 that involves populating the products table, you created different transformations for updating the products—one for each manufacturer. Now you will put all that work together.

Create a job that accepts two arguments—the name of the file to process and the code of the manufacturer to which the file belongs.

Create a transformation that validates that the code provided belongs to an existent manufacturer. If the code exists, set a variable named TRANSFORMATION_FILE with the name of the transformation that knows how to process the file for that manufacturer.

The transformation must also check that the name provided is not null. If it is not null, set a variable named FILENAME with the name supplied.

Then, in the job, check that the file exists. If it exists and the manufacturer code is valid, run the proper transformation. In order to do so, put ${TRANSFORMATION_FILE} as the name of the transformation in the transformation job entry dialog window. Now test your job.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.51.241