We considered some options for structuring data in Tableau Desktop in the previous chapter. Many of the concepts around well-structured data will apply here as we now turn our attention to another product from Tableau: Tableau Prep. Tableau Prep extends the Tableau platform with robust options for cleaning and structuring data for analysis in Tableau. In the same way that Tableau Desktop provides a hands-on, visual experience for visualizing and analyzing data, Tableau Prep provides a hands-on, visual experience for cleaning and shaping data.
Tableau Prep is on an accelerated, monthly release cycle and while the platform continues to grow and expand, there is an underlying paradigm that sets a foundation for cleaning and shaping data. We'll cover a lot of ground in this chapter, but our goal is not to cover every possible feature—and indeed, we won't. Instead, we will seek to understand the underlying paradigm and flow of thought that will enable you to tackle a multitude of data challenges in Tableau Prep.
In this chapter, we'll work through a couple of practical examples as we explore the paradigm of Tableau Prep, understand the fundamental transformations, and see many of the features and functions of Tableau Prep.
We'll cover quite a few topics in this chapter, including the following:
In this chapter, we'll use the term Tableau Prep broadly to speak of the entire platform that Tableau has developed for data prep and sometimes as shorthand for Tableau Prep Builder, the client application that's used to connect to data, create data flows, and define output. Where needed for clarity, we'll use these specific names:
Let's start by understanding how to get started with Tableau Prep.
Tableau Prep Builder is available for Windows and Mac. If you do not currently have Tableau Prep Builder installed on your machine, please take a moment to download the application from https://www.tableau.com/products/prep/download. Licenses for Tableau Prep Builder are included with Tableau Creator licensing. If you do not currently have a license, you may trial the application for 14 days. Please speak with your Tableau representative to confirm licensing and trial periods.
The examples in this chapter use files located in the Learning TableauChapter 15
directory. Specific instructions will guide you on when and how to use the various files.
Once you've downloaded and installed Tableau Prep Builder, you will be able to launch the application. Once you do, you'll find a welcome screen that we'll detail as we cover the interface in the next section.
You'll find a lot of similarities in the interfaces of Tableau Prep Builder and Tableau Desktop. The home screen of Tableau Prep Builder will look similar to this:
Figure 15.1: The Tableau Prep Builder welcome screen with numbering to identify key components of the UI
The following components have been numbered in Figure 15.1:
Once you have opened or started a new flow, the home screen will be replaced with a new interface, which will facilitate the designing and running of flows:
Figure 15.2: When designing a flow, you'll find an interface like this one. The major components are numbered and described as follows
This interface consists of the following, which are numbered in the preceding screenshot:
Order Date
field in the profile pane will filter the data grid to show only records represented by that bar. This allows you to explore the data, but doesn't alter the data until you perform a specific action that does result in a change.You will also notice the toolbar that allows you to undo or redo actions, refresh data, or run the flow. Additionally, there will be other options or controls that appear based on the type of step or field that's selected. We'll consider those details as we dive into the paradigm of Tableau Prep, and a practical example later in the chapter.
The overall paradigm of Tableau Prep is a hands-on, visual experience of discovering, cleaning, and shaping data through a flow. A flow (sometimes also called a data flow) is a logical series of steps and changes that are applied to data from input(s) to output(s).
Here is an example of what a flow looks like in the flow pane of Tableau Prep:
Figure 15.3: An example flow in Tableau Prep
Each of the individual components of the flow are called steps, which are connected by lines that indicate the logical flow of data from left to right. The lines are sometimes called connectors or branches of the flow. Notice that the Aggregate Step here has one line coming in from the left and three branches extending to the right. Any step can have multiple output branches that represent logical copies of the data at that point in the flow.
One important thing to notice is that four of the step types represent the four major transformations of data we discussed in Chapter 14, Structuring Messy Data to Work Well in Tableau. The step types of Pivot, Union, Join, and Aggregate exactly match those transformations, while the Clean Step allows various other operations involved in cleaning and calculating. You may wish to refresh your memory on the basic transformations in the previous chapter.
As we work through an example of a flow throughout this chapter, we'll examine each type of step more closely. For now, consider these preliminary definitions of the primary steps in Tableau Prep:
MIN
, MAX
, SUM
, AVG
) at a level of detail you specify.Both the Union Step and Join Step in this example have an error icon, indicating that something has not been configured correctly in the flow. Hovering over the icon gives a tooltip description of the error. In this case, the error is due to only having one input connection, while both the union and join require at least two inputs. Often, selecting a step with an error icon may reveal details about the error in the changes pane or elsewhere in the configuration steps.
.csv
), extract (.hyper
or .tde
), or published extracted data source to Tableau Server. The ability to output to a database has been announced, although is not available at the time of writing. You'll have options to select the type of output, along with the path and filename or Tableau Server and project.Right-clicking a step or connector reveals various options. You may also drag and drop steps onto other steps to reveal options such as joining or unioning the steps together. If you want to replace an early part of the flow to swap out an input step, you can right-click the connector and select Remove, and then drag the new input step over the desired next step in the flow to add it as the new input.
In addition to using the term flow to refer to the steps and connections that define the logical flow and transformation of the data, we'll also use the term flow to refer to the file that Tableau Prep uses to store the definition of the steps and changes of a flow. Tableau Prep flow files have the .tfl
(unpackaged flow) or .tflx
(packaged flow) extension.
The paradigm of Tableau Prep goes far beyond the features and capabilities of any single step. As you build and modify flows, you'll receive instant feedback so that you can see the impact of each step and change. This makes it relatively easy (and fun!) to iteratively discover your data and make the necessary changes.
When you are building flows, adding steps, making changes, and interacting with data, you are in design mode. Tableau Prep uses a combination of the Hyper engine's cache, along with direct queries of the database, to provide near-instant feedback as you make changes. When you run a flow, you are using batch mode or execution mode. Tableau Prep will run optimized queries and operations that may be slightly different than the queries that are run in design mode.
We'll consider an example in the remainder of this chapter to aid in our discussion of the Tableau Prep paradigm and highlight some important features and considerations. The example will unfold organically, which will allow us to see how Tableau Prep gives you incredible flexibility to address data challenges as they arise and make changes as you discover new aspects of your data.
We'll put you in the role of an analyst at your organization, with the task of analyzing employee air travel. This will include ticket prices, airlines, and even a bit of geospatial analysis of the trips themselves. The data needs to be consolidated from multiple systems and will require some cleaning and shaping to enable the analysis.
To follow along, open Tableau Prep Builder, which will start on the home screen (there is not a starter flow for this chapter). The sample data is in the Chapter 15
directory, along with the Complete
flow if you want to check your work. The Complete (clean)
flow contains a sample of how a flow might be self-documented—it will not match screenshots precisely.
When you open the Complete
flow file, you'll likely receive errors and warnings that input paths and output paths are not valid. This is expected because your machine will almost certainly have a different drive and directory structure than the one on which the examples were prepared. You'll also run into this behavior when you share flow files with others. To resolve the issues, simply work through the connections in the Connections pane (expanded in Figure 15.4) on the left to reconnect to the files and set output steps to appropriate directories on your machine.
We'll start by connecting to some data!
Connecting to data in Tableau Prep is very similar to connecting to data in Tableau Desktop. From the home screen, you may click either Connect to Data or the + button on the expanded Connections pane:
Figure 15.4: You can make a new data connection by clicking the + button or the Connect to Data button
Either UI element will bring up a list of data source types to select.
As with Tableau Desktop, for file-based data sources, you may drag the file from Windows Explorer or Finder onto the Tableau Prep window to quickly create a connection.
Tableau Prep supports dozens of file types and databases, and the list continues to grow. You'll recognize many of the same types of connection possibilities that exist in Tableau Desktop. However, at the time of writing this book, Tableau Prep does not support all the connections that are available in Tableau Desktop.
You may create as many connections as you like and the Connections pane will list each connection separately with any associated files, tables, views, and stored procedures, or other options that are applicable to that data source. You will be able to use any combination of data sources in the flow.
For now, let's start our example with the following steps:
Figure 15.5: The input preview allows you to select input fields to include in the flow, rename fields, and change data types
The input step displays a grid of fields and options for those fields. You'll notice that many of the fields in the Employee Flights table are named F2
, F3
, F4
, and so on. This is due to the format of the Excel file, which has merged cells and a summary sub-table. Continue the exercise with the following steps:
Figure 15.6: The data interpreter parses the file to fix common issues such as merged cells, empty headers, and sub-total lines
When you select an input step, Tableau Prep will display a grid of fields in the data. You may use the grid to uncheck any fields you do not wish to include, edit the Type of data by clicking the associated symbol (for example, change a string to a date), and edit the Field Name itself by double-clicking the field name value.
If Tableau Prep Builder detects that the data source contains a large number of records, it may turn on data sampling. Data Sampling uses a smaller subset of records for giving rapid feedback and profiling in design mode. However, it will use the full set of data when you run the entire flow in batch mode. You can control the data sampling options by clicking Data Sample on the input pane. While you can set the sample size for the source, subsequent steps, such as joins, that result in large numbers of records may turn on sampling that cannot be disabled. You'll receive an indicator of Data Sampling if it occurs anywhere in the flow.
Figure 15.7: Adding a step extends the flow. Here, adding a clean step adds Clean 1
Ticket
Type
:
Figure 15.8: Selecting a value for a field in the profile pane highlights which values (and what proportion of those values) relate to the selected value
Highlighting the bar segments across fields in the profile pane, which results from selecting a field value, is called brushing. You can also take action on selected values via the toolbar at the top of the profile pane or by right-clicking a field value. These actions include filtering, editing values, or replacing with NULL
. However, before making any changes or cleaning any of the data, let's connect to some additional data.
It turns out that most of the airline ticket booking data is in one database that's represented by the Excel file, but another airline's booking data is stored in files that are periodically added to a directory. These files are in the Learning TableauChapter 15
directory. The files are named with the convention Southwest YYYY.csv
(where YYYY
represents the year
).
We'll connect to all the existing files and ensure that we are prepared for additional future files:
Learning TableauChapter 15
directory and select any of the Southwest YYYY.csv
files to start the connection. Looking at the Input settings, you should see that Tableau Prep correctly identifies the field separators, field names, and types:
Figure 15.9: A text file includes options for headers, field separators, text qualifiers, character sets, and more. Notice also the tabs such as Multiple Files and Data Sample giving other options for the text input
Southwest*
and click Apply. This tells Tableau Prep to union all of the text files in the directory that begin with Southwest
together:Figure 15.10: Using Matching Pattern tells Tableau Prep which files to union together. That way, when Southwest 2020.txt and future files are dropped into the directory, they will be automatically included
With our input steps defined, let's move on to consider how to clean up some of the data to get it ready for analysis.
The process of building out the flow is quite iterative, and you'll often make discoveries about the data that will aid you in cleaning and transforming it. We'll break this example into sections for the sake of reference, but don't let this detract from the idea that building a flow should be a flow of thought. The example is meant to be seamless!
We'll take a look at quite a few possibilities for prepping the data in this section, including merging and grouping. Let's start with seeing how to union together branches in the flow.
We know that we want to bring together the booking data for all the airlines, so we'll union together the two paths in the flow:
Figure 15.11: Dragging one step onto another in the flow reveals options for bringing the datasets together in the flow. Here, for example, there are options for creating a Union or Join
Fare
Type
and Ticket
Type
are named differently between the Excel file and the text files, but in fact represent the same data. Hold down the Ctrl key and select both fields. Then, select Merge Fields from the toolbar at the top of the pane or from the right-click menu:
Figure 15.12: When you select a single field, Tableau Prep will highlight fields that are potentially the same data. Selecting both reveals the Merge Fields option
Row
ID
and Row_ID
.Southwest
files, which were unioned together in the Input step. While this auto-generated field can be very useful at times, it does not add anything to the data in this example. Select the field, then click the ellipses menu button and select Remove Field.Travel
Insurance?
and Passenger
ID
apply to only one of the inputs and will be of little use in our analysis. Remove those fields as well.Airline
, is useful. Leave it for now and click the + icon on the Union 1 step in the flow pane and extend the flow by selecting Clean Step. At this point, your flow should look like this:
Figure 15.13: Your flow should look similar to this. You may notice some variation in the exact location of steps or color (you can change a step's color by right-clicking a step)
There is an icon above the Union 1 step in the flow, indicating changes that were made within this step. In this case, the changes are the removal of several of the fields. Each step with changes will have similar icons, which will reveal tooltip details when you hover over them and also allow you to interact with the changes. You can see a complete list of changes, edit them, reorder them, and remove them by clicking the step and opening the changes pane. Depending on the step type, this is available by either expanding it or selecting the changes tab.
Next, we'll continue building the flow and consider some options for grouping and cleaning.
Now, we'll spend some time cleaning up the data that came from both input sources. With the Clean 3 step selected, use the profile pane to examine the data and continue our flow. The first two fields indicate some issues that need to be addressed:
Figure 15.14: Every null value in the Airline field comes from the Southwest files. Fortunately, in this case, the source of the data indicates the airline
The Table
Names
field was generated by Tableau Prep as part of Union 1 to indicate the source of the records. The Airline
field came only from the Excel files (you can confirm this by selecting it in the profile pane and observing the highlighted path of the field in the flow pane). Click the null value for Airline
and observe the brushing: this is proof that the null
values in Airline
all come from the Southwest files since those files did not contain a field to indicate the airline. We'll address the null
values and do some additional cleanup:
Southwest
to replace NULL
with the value you know represents the correct airline. Tableau Prep will indicate that a Group and Replace operation has occurred with a paperclip icon.Airline
field, select Group Values | Pronunciation:
Figure 15.15: The ellipses button on a field will reveal a plethora of options, from cleaning to filtering, to grouping, to creating calculations
Nearly all the variations are grouped into the American value. Only AA remains.
Figure 15.16: When grouping by pronunciation, you'll notice a slider allowing you control over the sensitivity of the grouping. You can also manually adjust groupings by selecting a field
Table
Names
field, which is no longer needed. Using either the toolbar option, the menu from a right-click for the field, or the options button, select Remove Field.Passenger
Email
field and then Apply the recommendation to assign a data role of Email:
Figure 15.17: Recommendations will show when Tableau Prep has suggestions for cleaning a field
Data Roles allow you to quickly identify valid or invalid values according to what pattern or domain of values is expected. Once you have assigned a data role, you may receive additional recommendations to either filter or replace invalid values.
After applying the recommendation, you'll see an indication in the profile pane for invalid values. As you continue following the example, we'll consider some options for quickly dealing with those invalid values.
Passenger
Email
field again. You'll see two new options presented. Apply the option to Group and Replace invalid values with null
:
Figure 15.18: Here, Tableau Prep suggests either filtering out records with invalid values or replacing the invalid values with null. In this case, we don't want to filter out the entire record, but the invalid values themselves are useless and are best represented by null
Fare
Type
(or possibly Ticket
Type
, depending on which name was kept when merging the fields previously). This field contains the values 1st Class and First Class. Select both of these values by clicking each while holding down the Ctrl key and then Group them together with the First Class value. Two interface options for grouping the values are indicated here:Figure 15.19: After selecting two or more values, you can group them together with the toolbar option or the right-click menu
Figure 15.20: You may preview the data represented by any step in Tableau Desktop by selecting the option from the right-click menu for that step
A new data connection will be made and opened in Tableau Desktop. You can preview the data for any step in the flow. Take a few moments to explore the data in Tableau Desktop and then return to Tableau Prep. Now, we'll turn our attention to extending the dataset with some calculations, supplemental data, and a little restructuring.
Let's look at how to create calculations and some options for aggregations in Tableau Prep.
Calculations in Tableau Prep follow a syntax that's nearly identical to Tableau Desktop. However, you'll notice that only row-level and FIXED
level of detail functions are available. This is because all calculations in Tableau Prep will apply to the row level. Aggregations are performed using an Aggregate Step, which we'll consider shortly.
Calculations and aggregations can greatly extend our analytic capabilities. In our current example, there is an opportunity to analyze the length of time between ticket purchase and actual travel. We may also want to mark each record with an indicator of how frequently a passenger travels overall. Let's dive into these calculations as we continue our example with the following steps:
DATEDIFF('day', [Purchase Date], [Travel Date])
.Figure 15.21: The calculated field shows up in the profile pane
The default view here (as in many cases with numeric fields) is a summary binned histogram. You can change the view to see its details by selecting the ellipses button in the upper right of the field and switching to Detail, which will show every value of the field:
Figure 15.22: Numeric and date fields can be viewed in Summary or in Detail
The shape of the data that's indicated by the default summary histogram is close to what we might have expected with most people purchasing tickets closer (but not immediately prior) to the date of travel. There might be some opportunity for getting better deals by purchasing farther in advance, so identifying this pattern (and then exploring it more fully in Tableau Desktop) will be key for this kind of analysis.
There are a few other types of analysis we may wish to pursue. Let's consider how we might create segments of passengers based on how often they travel.
We'll accomplish this using a FIXED
level of detail (LOD) expression. We could create the calculation from scratch, matching the syntax we learned for Tableau Desktop to write the calculation like this:
{FIXED [Person] : COUNTD([Row_ID])}
The preceding calculation would count the distinct rows per person. Knowing that each row represents a trip, we could alternately use the code {FIXED [Person] : SUM(1)}
, which would potentially be more performant, depending on the exact data source.
In this example, though, we'll leverage the interface to visually create the calculation:
Person
field and select Create Calculated Field | Fixed LOD:
Figure 15.23: To create a Fixed LOD calculation, use the menu and select Create Calculated Field | Fixed LOD
Notice also the options to create Custom Calculation (to write code) and Rank (to compute rank for the selected field).
Person
(as we started the calculation from that field), but we'll need to configure Compute using to perform the distinct count of rows and rename the field as Trips
per
Person
, as shown here:
Figure 15.24: The Fixed LOD pane allows you to configure the LOD expression visually and get instant visual feedback concerning results
Trips per Person
field to create segments of customers. We'll accomplish this with another calculated field, so click Create Calculated Field… to bring up the code editor. Name the field Frequency Segment
and enter the following code:
IF [Trips per Person] <= 2 THEN "Rarely"
ELSEIF [Trips per Person] <= 5 THEN "Occasionally"
ELSE "Frequently"
END
The code uses the Trips per Person
field in an If
Then
Else
construction to create three segments. You can visually see the correspondence between the fields in the preview pane:
Figure 15.25: You can easily visualize how calculations relate to each other and other fields using the Profile pane
The Frequency
Segment
field can be used to accomplish all kinds of useful analysis. For example, you might want to understand whether frequent travelers typically get better ticket prices.
We've seen row-level and FIXED LOD calculations, and noted the option for Rank. Let's now turn our attention to aggregations.
Aggregations in Tableau Prep are accomplished using an aggregate step. We'll continue our flow with the idea that we want to better understand our frequency of travel segment:
Figure 15.26: Adding an Aggregate step to the flow using the + symbol
Give steps meaningful names to self-document the flow. This will greatly help you and others when you return to edit the flow in the future. Additionally, when you are editing the name of a step, the Add a description text will appear below the name, as shown in Figure 15.27.
Figure 15.27: When editing the name of a step, you may also add a more verbose description to help document the steps purpose
Selecting the aggregate step reveals a pane with options for grouping and aggregating fields in the flow:
Figure 15.28: Adding an Aggregate step to the flow using the + symbol
You may drag and drop fields from the left to the Grouped Fields or Aggregated Fields sections and you may change the type of aggregation by clicking on the aggregation text (examples indicated by arrows in Figure 15.28: SUM
next to Trips
per
Person
or AVG
above Ticket
Price
) and selecting a different aggregation from the resulting dropdown.
In Figure 15.28, notice that we've added Frequency Segment to the GROUP and Ticket
Price
to the Aggregated Fields as an AVG
. Notice also the Number
of
Rows
(Aggregated) that appears at the bottom of the list of fields on the left. This is a special field that's available in the aggregation step.
Figure 15.29 Adding an Output step to the flow using the + symbol
Figure 15.30: This output will contain exactly three rows of data
We'll extend our flow in the next few sections to additionally output detailed data. The detailed data as well as the output file of aggregate data gives us some nice options for leveraging Tableau's Data Model in Tableau Desktop to accomplish some complex analysis.
Let's continue by thinking about filtering data in Tableau Prep.
There are a couple of ways to filter data in Tableau Prep:
Filtering an input can be efficient because the query that's sent to the data source will return fewer records. To filter an input, select the input step and then click the Filter Values... button on the input pane:
Figure 15.31: The Filter Values... option allows you to filter values on the input step. This could improve performance on large datasets or relational databases
The Add Filter dialog that pops up allows you to write a calculation with a Boolean
(true/false) result. Only true values will be retained.
Filtering may also be done within a clean step anywhere in the flow. There are several ways to apply a filter:
Figure 15.32: Filter options for a field include filtering by Calculation, Range of Dates, and Relative Dates, and keeping or excluding Null Values
[Travel Date] > MAKEDATE(2016, 1, 1)
While no filtering is required for the dataset in our example, you may wish to experiment with various filtering techniques.
At this point, your flow should look something like this:
Figure 15.33: Your flow should look similar to this (exact placement and colors of steps may vary)
Let's conclude the Tableau Prep flow with some final transformations to make the data even easier to use in Tableau.
Let's create a new branch in the flow to work once again with the detailed data. Click on the Clean 3 step and examine the preview pane. In particular, consider the Route
field:
Figure 15.34: Route uses airport codes for origin and destination separated by a dash
Tableau Desktop (and Server) contain built-in geocoding for airport codes. But to accomplish our specific analysis goal (and open other possibilities for geospatial functions in Tableau Desktop), we'll supplement our data with our own geocoding. We'll also need to consider the shape of the data. Origins and destinations will be most useful split into separate fields, and if we want to connect them visually, we'll also want to consider splitting them into separate rows (a row for the origin and another row for the destination).
There are quite a few possibilities for visualizing this data. For example, we could keep origin and destination on the same row and use a dual-axis map. If we want to connect origins with destinations with a line, we could keep them in the same row of data and use Tableau's MAKELINE()
function. The example you'll follow here will direct you to split the data into separate rows.
If you are following along, here are the steps we'll take:
Figure 15.35: Adding to a step that already has an output adds a new branch to the flow
Route
field and select Split Values | Automatic Split:
Figure 15.36: Split Values allows you to divide delimited strings into separate fields. Automatic Split attempts to determine the delimiter, while Custom Split… allows you greater options and flexibility
You'll now see two new fields added to the step:
Figure 15.37: The results of the split will be new fields in the flow
Origin
and Destination
.US Airports.hyper
file in the Chapter 15
directory. This file contains each Airport
Code
along with the Airport
Name
and Latitude
and Longitude
:
Figure 15.38: The hyper extract contains the data we'll need to supplement the flow with our own geospatial data
Figure 15.39: The input pane for the Airport Codes file
Latitude
and Longitude
, but before we do, we'll need to account for the fact that Origin
and Destination
in Clean 4 are currently both on the same row of data. One option is to pivot the data. Use the + button on the Clean 4 step to add a Pivot step:
Figure 15.40: Adding a Pivot step from Clean 4
Origin
and Destination
fields into the Pivot1 Values area of the pane:
Figure 15.41: Pivot1 Names keeps values from the original column names, while Pivot1 Values contains the actual values from Origin and Destination
Origin
and Destination
fields in the Clean 4 step, and selected Pivot Columns to Rows:Figure 15.42: A shortcut for pivoting columns to rows
Continue with the following steps:
Airport Code
. This field will contain all the airport codes for both the origin and destination records.Route Point
. This field will label each record as either an Origin or Destination.At this point, we have a dataset that contains a single record for each endpoint of the trip (either an origin or destination).
Notice that the pivot resulted in duplicate data. What was once one row (origin and destination together) is now two rows. The record count has doubled, so we can no longer count the number of records to determine the number of trips. We also cannot SUM
the cost of a ticket as it will double count the ticket. We'll need to use MIN
/MAX
/AVG
or some kind of level of detail expression or filter to look at only origins or destinations. While many transformations allow us to accomplish certain goals, we have to be aware that they may introduce other complications.
The only location information we currently have in our main flow is the airport code. However, we already made a connection to Airports.hyper
and renamed the input step as Airport Codes.
Figure 15.43: Dragging Airport Codes to the Join area of the Pivot step
After dropping the Airport Codes input step onto the Join area, a new join step will be created, named Join 1. Take a moment to examine the join pane:
Figure 15.44: The join pane gives a lot of information and options for configuring the join and understanding the results. Important sections of the interface are numbered with descriptions below
You'll notice the following features in Figure 15.44:
We do not need to configure anything in this example. The default of an Inner join on the Airport Code
fields works well. We can confirm that all 9,158
records from the Pivot step are kept. Only 32
records from the Airport Codes hyper file are actual matches (1,301
records didn't match). That is not concerning. It just means we had a lot of extra codes that could have possibly supplemented our data but weren't actually needed. Now, continuing from our previous example:
.csv
file named Airline Travel.csv
.Figure 15.45: The toolbar allows you to run the flow for all outputs or a single output, while the button on the output step will run the flow only for that output
Your final flow will look something like this:
Figure 15.46: Your final flow will resemble this, but may be slightly different in appearance
The Chapter 15 Complete (clean).tfl
file is a bit cleaned up with appropriate step labels and descriptions. As a good practice, try to rename your steps and include descriptions so that your flow is easier to understand. Here is how the cleaned version looks:
Figure 15.47: This flow is cleaned up and contains "self-documentation"
Once the flow has been executed, open the Airline Travel.twb
workbook in the Learning TableauChapter 15
directory to see how the data might be used and to explore it on your own:
Figure 15.48: Exploring the data in the Airline Travel.twb workbook
Unlike .tde
or .hyper
files, .csv
files may be written to, even if they are open as a data source in Tableau Desktop. You will receive an error if you run a flow that attempts to overwrite a .tde
or .hyper
file that is in use. Additionally, you may rearrange the field order for a .csv
file by dragging and dropping fields into the profile pane of a clean step prior to the output.
With our example concluded, let's wrap up by considering some options for automating Tableau Prep flows.
Tableau Prep Builder allows you to design and run flows using the application. Sometimes, data cleansing and prepping will be a one-time operation to support an ad hoc analysis. However, you will often want to run a flow subsequently to capture new or changed data and to cleanse and shape it according to the same pattern. In these cases, you'll want to consider some options for automating the flow:
Tableau Prep's innovative paradigm of hands-on data cleansing and shaping with instant feedback greatly extends the Tableau platform and gives you incredible control over your data. In this chapter, we considered the overall interface and how it allows you to iteratively and rapidly build out a logical flow to clean and shape data for the desired analysis or visualization.
Through a detailed, yet practical, example that was woven throughout this chapter, we explored every major transformation in Tableau Prep, from inputs to unions, joins, aggregates and pivots, to outputs. Along the way, we also examined other transformations and capabilities, including calculations, splits, merges, and the grouping of values. This gives you a foundation for molding and shaping data in any way you need.
In the next chapter, we'll conclude with some final thoughts on how you can leverage Tableau's platform to share your analysis and data stories!
18.117.99.152