Deleting a custom list of files

Suppose a scenario where you have to delete some files but you don't have the names of the files to delete beforehand. If you can specify that list with regular expressions, that wouldn't be a problem, but sometimes that is not possible. In these cases you should use a helper transformation that builds the list of files to delete. This recipe shows you how to do it.

For this recipe, assume you want to delete from a source directory all the temporary files that meet two conditions: the files have a .tmp extension and a size of 0 bytes.

Getting ready

In order to create and test this recipe, you need a directory with a set of sample files; some of them should have the .tmp extension and zero size. Some example files are shown in the following screenshot:

Getting ready

In the preceding screenshot, the files that must be deleted are sample3.tmp, sample5.tmp, and sample7.tmp.

How to do it...

Carry out the following steps:

  1. Create the transformation that will build the list of files to delete.
  2. Drop a Get File Names step into the canvas.
  3. Under the File tab, fill the Selected files: grid. Under File/Directory, type ${Internal.Transformation.Filename.Directory}sample_directory and under Wildcard (RegExp), type .*.tmp.
  4. From the Flow category, add a Filter rows step.
  5. Use this step to filter the files with size equal to zero. In order to do that, add the condition size = 0.
  6. After the Filter rows step, add the Select values step. When asked for the kind of hop to create, select Main output of step. This will cause only those rows that meet the condition to pass the filter.
  7. Use the Select values step to select the field's path and short_filename.
  8. From the Job category of Steps, add a Copy rows to result step.
  9. Save the transformation.
  10. Create a new job and add a Start entry.
  11. Add a Transformation entry and configure it to run the transformation previously created.
  12. Add a Delete files entry from the File management category.
  13. Double-click on it and check the Copy previous Results to args? prompt.
  14. Save the job and run it. The files with a .tmp extension and size 0 bytes will be deleted.

How it works...

In this recipe, you deleted a list of files by using the Delete files job entry. In the selected files grid of that entry, you have to provide the complete name of the files to delete or the directory and a regular expression. Instead of typing that information directly, here you built the rows for the grid in a separate transformation.

The first step used in the transformation is the Get File Names. This step allows you to get information about a file or set of files or folders. In this example, the step gets the list of .tmp files from the sample_directory folder.

The following screenshot shows all of the information that you obtain with this step:

How it works...

You can see these field names by pressing the space bar while having the focus on the Get File Names step.

After to that step, you used a Filter rows step to keep just the files with size 0 bytes.

If you do a preview on this step, you will see a dataset with the list of the desired files, that is, those that meet the two conditions: having the .tmp extension and size equal to 0 bytes.

After that, you selected just the fields holding the path and the short_filename and copied these rows to memory. You did that with the Copy rows to result step.

Now, let's go back to the job. The Copy previous result to args? prompt selected in the Delete files entry causes the job to read the rows coming from the transformation, and copy them to the grid. In other words, each row coming out of the transformation (a data pair: path, short_filename) becomes a row in the Files/Folders: grid.

With that information, the job is finally able to delete the specified files.

See also

  • The recipe named Deleting one or more files in this chapter. See this recipe if the list of files to delete (or at least a regular expression specifying that list) is known in advance.
  • The recipe named Discarding rows in a stream based on a condition in Chapter 6, Understanding Flows of Data to understand the use of the Filter rows step.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.228.138