Transforming the data for analysis

At this point, we have a very useful dataset for analysis in Tableau Desktop, but there's a bit of additional transformation we might want to employ. What if we wanted to render the flight paths in this dataset on a map? One approach in Tableau Desktop requires a dataset with two records per path: a record with latitude and longitude for an origin point and a separate record with latitude and longitude for a destination point.

Our dataset currently contains a single record for the path. We have a field named Route that contains values such as DAL-PHX and DFW-JFK. These pairs of values give us the origin airport code and destination airport code. With a bit of supplemental data and a transformation of the data, we can end up with data in a shape that allows us to visualize the path between the airports.

Conclude this chapter's example with the following steps:

  1. Locate the Route field in the Clean 5 step and select it.
  2. With the Route field selected, click Automatic Split on the pane's toolbar. The results are shown here:

The Automatic Split evaluates the field and attempts to split based on a common delimiter. In this case, the dash was identified as a likely delimiter of values and the Automatic Split results in two additional fields.

The Search box in the upper right of the profile pane allows you to quickly find fields in the profile pane. Typing route in the preceding example finds the original Route field as well as the two split fields.
  1. Rename Route - Split 1 and Route - Split 2 as Origin and Destination, respectively.

We still have both the origin and destination in the same record, and our goal is to have a record for the origin and another record for the destination. Fortunately, Tableau Prep makes it easy to reshape the data from columns into rows (or, if needed, rows into columns). You'll accomplish this with a pivot.

  1. Using the + icon on the Clean 5 step, select Add Pivot.
  2. The pivot pane gives you options for transforming rows into columns or columns into rows. We'll keep the default option. Drag both the Origin and Destination fields into the Pivot 1 Values area of the pane:
For columns to rows, the Pivot Values will be a column containing all of the values for the pivoted fields. The Pivot Names will be a column containing the name of the original column.
  1. Double-click the text for Pivot 1 Values and rename the field to Airport Code. This field will contain all the airport codes for both origin and destination records.
  2. Double-click the text for Pivot 1 Names and rename the field to Route Point. This field will label each record as either an Origin or Destination.

At this point, we have a dataset that contains a single record for each endpoint of the trip (either an origin or destination).

Notice that the pivot resulted in duplicate data. What was once one field (origin and destination together) is now two fields. The record count has doubled, so we can no longer count the number of records to determine the number of trips. We also cannot SUM the cost of a ticket as it will double count the ticket. We'll need to use MIN/MAX/AVG or some kind of level of detail expression or filter to look at only origins or destinations. While many transformations allow us to accomplish certain goals, we have to be aware that they may introduce other complications.

The only location information we currently have is an airport code. While Tableau Desktop contains geocoding for airport codes, we'll supplement our data with actual latitude and longitude values for each point:

  1. Add a new data connection, selecting Tableau Extract as the type.
  2. Locate and select US Airports.hyper in the Learning TableauChapter 10 directory. This file contains geocoding for many US airport codes.
  3. The input step should be added automatically as the extract contains only one table of data. Drag this new input into the Pivot 1 step and drop it on the Join option. Tableau Prep will create a new join step and should automatically detect Airport Code as the field that matches between the two sides of the join.
  4. Our dataset now contains all of the data that's required. Complete the flow by adding a final output. Set the type to .csv, browse to the Learning TableauChapter 10 directory, and give the file the name Employee Flights.csv.
The output steps may be configured to publish the resulting data as a Tableau Server data connection or to output this to a file of type .csv, .tde (Tableau Data Extract), or .hyper (Hyper data extract).

Now, you can run the flow by using the run button at the top of the toolbar or by clicking the run button on the output step. Once the flow has been executed, open the Employee Travel.twb workbook in the Learning TableauChapter 10 directory to see how the data might be used and to explore it on your own:

Unlike .tde or .hyper files, .csv files may be written to, even if they are open as a data source in Tableau Desktop. You will receive an error if you run a flow that attempts to overwrite a .tde or .hyper file that is in use. Additionally, you may rearrange the field order for a .csv file by dragging and dropping fields into the profile pane of a clean step prior to the output.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.72.74