Chapter 13. Taking it Further

The lessons learned in previous chapters gave you the basis of PDI. If you liked working with PDI and intend to use it in your own projects, there is much more ranging from applying best practices to using PDI integrated with the Pentaho BI Suite.

This chapter points you the right direction for taking it further. The chapter begins by giving you some advice to take into account in your daily work with PDI. After that it introduces you some advanced PDI concepts for you to know to what extent you can use the tool beyond the basics.

PDI best practices

If you intend to work seriously with PDI, knowing how to accomplish different tasks is not enough. Here are some guidelines that will help you go in the right direction.

  • Outline your ideas on paper before creating a transformation or a job:

    Don't drop steps randomly on the canvas trying to get things working. You could end up with a transformation or job that is difficult to understand and even useless.

  • Document your work:

    Write at least a simple description in the transformations and jobs setting windows. Replace the default names of steps and job entries with meaningful ones. Use notes to clarify the purpose of the transformations and jobs. Doing this, your work will be quite self documented.

  • Make your jobs and transformations clear to understand:

    Arrange the elements in the canvas so that it doesn't look like a puzzle to solve. Memorize the shortcuts for arrangement and alignment, and use them regularly. You'll find a full list in Appendix D, Spoon shortcuts.

  • Organize PDI elements in folders:

    Don't save all the transformations and jobs in the same folder. Organize them according to their purpose.

  • Make your work flexible and reusable:

    Make use of arguments, variables, and named parameters. If you identify tasks that are going to be used in several situations, create subtransformations.

  • Make your work portable (ready for deployment):

    This involves making sure even if you move your work to another machine or another folder, or the paths to source or destination files change, or the connection properties to the databases change, everything should work either with minimal changes or without changes. In order to make ensure that, don't use fixed names but variables. If you know the values for the variables beforehand, define the variables in the kettle.properties file. For the name of the transformations and jobs, use relative paths—use the ${Internal.Job.Filename.Directory} and ${Internal.Transformation.Filename.Directory} variables.

  • Avoid overloading your transformations:

    A transformation should do a precise task. If it doesn't, think of splitting it in two or more, or create subtransformations. Doing this will make your transformation clearer and also reusable in the case of subtransformations.

  • Handle errors:

    Try to figure out the kind of errors that may happen and trap them by validating and handling errors, and taking appropriate actions such as fixing data, taking alternative paths, sending friendly message to the log files, and so on.

  • Do everything you can to optimize the PDI performance:

    You can find a full checklist at http://wiki.pentaho.com/display/COM/PDI+Performance+tuning+check-list. As of version 3.1.0, PDI introduced a tool for tracking the performance of individual steps in a transformation. You can find more information at http://wiki.pentaho.com/display/EAI/Step+performance+monitoring.

  • Keep track of jobs and transformations history:

    You can use a versioning system such as subversion. Doing so, you could recover older versions of your jobs and transformations or examine the history of how they changed. For more on subversion, visit http://subversion.tigris.org/.

    Note

    Bookmark the forum page and visit it frequently. The PDI forum is available at http://forums.pentaho.org/forumdisplay.php?f=135.

    The following is the main PDI forum page:

    PDI best practices

If you get stuck with something, search for a solution in the forum. If you don't find what you're looking for, create a new thread, expose your doubts or scenario clearly, and you'll get a prompt answer, as the Pentaho community, and particularly the PDI one, is quite active.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.250.203