In this recipe, we will combine a number of comma separated value (CSV) files into a single data set in Tableau.
To follow this recipe, download the business license files from the City of Vancouver's website from http://data.vancouver.ca/datacatalogue/businessLicence.htm:
Download the CSV version, and save all the files in a local directory in your computer:
Here are the steps to prepare the data files:
business_licenses.csv
which contains the most recent year's records:business_licenses.csv
until you see the Drag table to union message:A union operation allows multiple sets of data to be appended to each other, that is, new records will be added to the end of the existing set of records.
To perform a union, we can simply drag the New Union operator to the data connection window and add all the files (or Excel worksheets from the same workbook) to the Union window.
Alternatively, we can skip dragging the New Union operator and simply drag the files (or worksheets) directly underneath the first text file or worksheet:
A union in relational databases requires what is called union compatibility. This means the two sets of records need to have the same number of columns and similar data types.
In Tableau, the union operation does not necessarily require union compatibility. If some of the incoming fields do not match the existing fields, the mismatched fields will simply have null values.
For example, if in some of our files the Business Name field was called Business Trade Name instead, we can use Tableau's Merge Mismatched Field operation:
What this operation does is combine the fields into a single field in the resulting data set. It will take the first non-null value for this new combined field. Thus, we have to take care to ensure that the fields are indeed supposed to be the same but just named differently; otherwise, we risk losing information.
Should you need to undo the merge, Tableau also provides a way to remove the merge:
When we union files or worksheets, Tableau adds metadata fields in the resulting data set. Tableau has the Table Name dimension for text files, which uses the original file name as the value:
After we union our files, we can do our analysis. One possibility is a heat map. In the view below, we have a heat map of issued business licenses in downtown Vancouver. This type of visualization can indicate how long businesses have been operating:
In the past, a union with multiple worksheets in the same Excel workbook can be done using a custom SQL in Excel, if using the legacy Jet connection. There is a recipe in this chapter called Using the legacy Jet driver that discusses this option.
In Tableau 10, the union operator is baked into the product. In this version, union works with text files (including the .csv
and .txt
file extensions) and multiple worksheets in Excel if saved in a single workbook. What if you need to combine multiple Excel files?
One improvement that is being promised in the future, and was showcased in the 2015 Tableau conference, is a wildcard union. This allows the union to operate on multiple files based on specific patterns on the filename. While not available in the initial release of Tableau 10, this will for sure be a much-awaited feature improvement for this operator.
A possible alternative to adding multiple Excel files is using data extracts. When you create an extract, you can append additional records from another file:
This is more restrictive than the union operator because you need to ensure the worksheet names are the same. You also need to ensure union compatibility; otherwise, you may encounter errors during the extract process. The following error is produced by the field name mismatch between the original file in the extraction and the incoming field names in the file being appended:
This field mismatch issue can be resolved in the new Tableau 10 Merge Mismatched Fields feature.
Learn more about the union operator from the Tableau online documentation:
https://onlinehelp.tableau.com/current/pro/desktop/en-us/union.html.
Please refer to the Creating a heat map recipe in Chapter 1, Basic Charts
Please refer to the Using the legacy Jet driver recipe in this chapter
3.144.94.190