Using union

In this recipe, we will combine a number of comma separated value (CSV) files into a single data set in Tableau.

Getting ready

To follow this recipe, download the business license files from the City of Vancouver's website from http://data.vancouver.ca/datacatalogue/businessLicence.htm:

Getting ready

Download the CSV version, and save all the files in a local directory in your computer:

Getting ready

How to do it...

Here are the steps to prepare the data files:

  1. Click on New Data Source icon and connect to business_licenses.csv which contains the most recent year's records:
    How to do it...
  2. Drag New Union to just underneath the business_licenses.csv until you see the Drag table to union message:
    How to do it...
  3. Select all other CSV files from the Files pane and drag them to the Union window:
    How to do it...
  4. Click on OK after you confirm that all the files have been added to the Union window:
    How to do it...
  5. Add a new sheet and create your visualization using this data set.

How it works...

A union operation allows multiple sets of data to be appended to each other, that is, new records will be added to the end of the existing set of records.

To perform a union, we can simply drag the New Union operator to the data connection window and add all the files (or Excel worksheets from the same workbook) to the Union window.

Alternatively, we can skip dragging the New Union operator and simply drag the files (or worksheets) directly underneath the first text file or worksheet:

How it works...

A union in relational databases requires what is called union compatibility. This means the two sets of records need to have the same number of columns and similar data types.

In Tableau, the union operation does not necessarily require union compatibility. If some of the incoming fields do not match the existing fields, the mismatched fields will simply have null values.

For example, if in some of our files the Business Name field was called Business Trade Name instead, we can use Tableau's Merge Mismatched Field operation:

How it works...

What this operation does is combine the fields into a single field in the resulting data set. It will take the first non-null value for this new combined field. Thus, we have to take care to ensure that the fields are indeed supposed to be the same but just named differently; otherwise, we risk losing information.

Should you need to undo the merge, Tableau also provides a way to remove the merge:

How it works...

When we union files or worksheets, Tableau adds metadata fields in the resulting data set. Tableau has the Table Name dimension for text files, which uses the original file name as the value:

How it works...

After we union our files, we can do our analysis. One possibility is a heat map. In the view below, we have a heat map of issued business licenses in downtown Vancouver. This type of visualization can indicate how long businesses have been operating:

How it works...

There's more...

In the past, a union with multiple worksheets in the same Excel workbook can be done using a custom SQL in Excel, if using the legacy Jet connection. There is a recipe in this chapter called Using the legacy Jet driver that discusses this option.

In Tableau 10, the union operator is baked into the product. In this version, union works with text files (including the .csv and .txt file extensions) and multiple worksheets in Excel if saved in a single workbook. What if you need to combine multiple Excel files?

One improvement that is being promised in the future, and was showcased in the 2015 Tableau conference, is a wildcard union. This allows the union to operate on multiple files based on specific patterns on the filename. While not available in the initial release of Tableau 10, this will for sure be a much-awaited feature improvement for this operator.

A possible alternative to adding multiple Excel files is using data extracts. When you create an extract, you can append additional records from another file:

There's more...

This is more restrictive than the union operator because you need to ensure the worksheet names are the same. You also need to ensure union compatibility; otherwise, you may encounter errors during the extract process. The following error is produced by the field name mismatch between the original file in the extraction and the incoming field names in the file being appended:

There's more...

This field mismatch issue can be resolved in the new Tableau 10 Merge Mismatched Fields feature.

Note

Learn more about the union operator from the Tableau online documentation:

https://onlinehelp.tableau.com/current/pro/desktop/en-us/union.html.

See also

Please refer to the Creating a heat map recipe in Chapter 1, Basic Charts

Please refer to the Using the legacy Jet driver recipe in this chapter

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.94.190