In this recipe, we will clean up the following spreadsheet on Canada international student permits and ready it for Tableau:
To follow this recipe, download the file from the Citizenship and Immigration Canada website using the following URL:
http://www.cic.gc.ca/opendata-donneesouvertes/data/IRCC_IS_0004_E.xls
Here are the steps to clean up the file:
Total
and check the Exclude checkbox:Total
and check the Exclude checkbox.Tableau works best with clean, tall, and narrow data instead of short and wide data. The same measures should ideally be provided in a single column instead of spread out.
The original Excel file is a common type of file that many data professionals have to work with. The Excel file has a header, a footer, and the measures are spread across the columns. The number of international students—a measure—is spread out across 13 columns.
When you first connect to this Excel file, this is what you will see:
This file needs to be cleaned up:
When we use the Tableau Data Interpreter, it will clean up the headers and footers, but will not clean up the year dimension and the measure for number of international students. When we run the Data Interpreter, we can also choose to review the results by clicking on the provided link. The first tab, presented here, provides the key to what the Data Interpreter does:
To further clean our data source, we need to pivot the remaining year so the values and number of international students are stored in single columns.
The original Excel file has some total fields, which we excluded, so that we can keep the granularity of the measure consistent:
For example, we would not want to sum all the measures and the field for total unique persons.
From here, we can create visualizations that are easier to work with in Tableau. The following screenshot depicts the number of students per period:
Since we have geocoded the Destination and assigned it the State/Province geographic role, we can also create a filled map to see where students are going:
Although the Destination field is geocoded to State/Province, we will still need the Country information before we can successfully create a map. For this data set, we can simply set the country manually by going to the Map menu item, and selecting Edit Locations. We can set this to Canada:
Alternatively, we can create a field for Country and use that in the geocoding.
You can probably see that there is additional cleanup and transformation that can be done. Period, for example, can be split further into year and quarter. We can even go as far as creating a date for the start of the period. This can be done using a calculated field:
18.226.251.70