Using the Data Interpreter and pivot

In this recipe, we will clean up the following spreadsheet on Canada international student permits and ready it for Tableau:

Using the Data Interpreter and pivot

Getting ready

To follow this recipe, download the file from the Citizenship and Immigration Canada website using the following URL:

http://www.cic.gc.ca/opendata-donneesouvertes/data/IRCC_IS_0004_E.xls

How to do it...

Here are the steps to clean up the file:

  1. Connect to the Excel file in this recipe. Make sure you choose Excel from the To a File section:
    How to do it...
  2. Check the checkbox beside Use Data Interpreter. Note that when this checkbox is checked, the label changes to Cleaned with Data Interpreter:
    How to do it...
  3. Select all fields except for Destination.
  4. While the fields are selected, right-click and choose Pivot:
    How to do it...
  5. Right-click the new fields to rename them:
    • Change Pivot Field Names to Period
    • Change Pivot Field Values to International Students
    How to do it...
  6. Click on Add underneath Filters:
    How to do it...
  7. In the Select a field: option, choose Period:
    How to do it...
  8. In the filter window for Period, under the Wildcard tab, type Total and check the Exclude checkbox:
    How to do it...
  9. Once you click OK, you should see the following in the Edit Data Source Filters box:
    How to do it...
  10. Click on OK when done.
  11. Under the Filters, click on Edit to add one more filter:
    How to do it...
  12. This time, choose the Destination field:
  13. In the filter window for Destination, under the Wildcard tab, type Total and check the Exclude checkbox.
  14. Once you click OK, you should see the following in the Edit Data Source Filters box:
    How to do it...
  15. Click OK when done.
  16. In the preview pane, click on the Abc symbol above Destination and change the Geographic Role to State/Province:
    How to do it...
  17. Create a new sheet and create your visualization.

How it works...

Tableau works best with clean, tall, and narrow data instead of short and wide data. The same measures should ideally be provided in a single column instead of spread out.

The original Excel file is a common type of file that many data professionals have to work with. The Excel file has a header, a footer, and the measures are spread across the columns. The number of international students—a measure—is spread out across 13 columns.

When you first connect to this Excel file, this is what you will see:

How it works...

This file needs to be cleaned up:

  • Header and footer needs to be removed
  • Year values need to be a dimension, since these are descriptors for the measure
  • The measure, which is the number of international students, need to be placed in a single column

When we use the Tableau Data Interpreter, it will clean up the headers and footers, but will not clean up the year dimension and the measure for number of international students. When we run the Data Interpreter, we can also choose to review the results by clicking on the provided link. The first tab, presented here, provides the key to what the Data Interpreter does:

How it works...

To further clean our data source, we need to pivot the remaining year so the values and number of international students are stored in single columns.

The original Excel file has some total fields, which we excluded, so that we can keep the granularity of the measure consistent:

How it works...

For example, we would not want to sum all the measures and the field for total unique persons.

From here, we can create visualizations that are easier to work with in Tableau. The following screenshot depicts the number of students per period:

How it works...

Since we have geocoded the Destination and assigned it the State/Province geographic role, we can also create a filled map to see where students are going:

How it works...

Although the Destination field is geocoded to State/Province, we will still need the Country information before we can successfully create a map. For this data set, we can simply set the country manually by going to the Map menu item, and selecting Edit Locations. We can set this to Canada:

How it works...

Alternatively, we can create a field for Country and use that in the geocoding.

There's more...

You can probably see that there is additional cleanup and transformation that can be done. Period, for example, can be split further into year and quarter. We can even go as far as creating a date for the start of the period. This can be done using a calculated field:

There's more...

See also

Please refer to the Creating a bar chart recipe in Chapter 1, Basic Charts.

Please refer to the Adding data layers to default map recipe in Chapter 5, Maps and Geospatial Visualization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.158.36