Flowing with the fundamental paradigm

The overall paradigm of Tableau Prep is a hands-on, visual experience of discovering, cleaning, and shaping data through a flow. A flow (sometimes also called a data flow) is a logical series of steps and changes that are applied to data from input(s) to output(s). Here is what a flow looks like in the flow pane of Tableau Prep:

Each of the individual components of the flow are called steps, which are connected by lines that indicate the logical flow of data (left to right). The lines are called connectors or branches of the flow. Notice that the Aggregate Step here has one connector coming in from the left and three branches extending to the right. Any step can have multiple output branches, and each branch of a flow may end in a separate output or may be subsequently joined or unioned back into another part of the flow.

As we work through an example of a flow throughout this chapter, we'll examine each type of step more closely. For now, consider these preliminary definitions of the primary steps in Tableau Prep:

  • Input step: An input step starts the flow with data from file(s), table(s), view(s), or custom SQL. It gives options for defining file delimiters, unions of multiple tables or files, and how much data to sample (for larger record sets).
  • Clean step: A clean step allows you to perform a wide variety of functions on the data, including calculations, filtering, adjusting data types, removing and merging fields, grouping and cleaning, and much more.
  • Aggregate step: An aggregate step allows you to aggregate values (for example, get the MIN, MAX, SUM, AVG) at a level of detail you specify.
  • Join step: A join step allows you to bring together two branches of the flow representing sets of data that can be joined on one or more key fields. You will have options for selecting the kind of join as well as the join fields.
  • Union step: A union step allows you to bring together two or more branches representing sets of data to be unioned together. You will have options for merging or removing mismatched fields.
    Both the Union Step and Join Step in this example have an error icon, indicating that something has not been configured correctly in the flow. Hovering over the icon gives a tooltip description of the error. In this case, the error is due to only having one input connection, while both the union and join require at least two inputs. Often, selecting a step with an error icon may reveal details about the error in the Changes pane or elsewhere in the configuration steps.
  • Pivot step: A pivot step allows you to transform columns of data into rows or rows of data into columns. You'll have options to select the type of pivot as well as the fields themselves. Sometimes, you may hear the term transpose in place of pivot.
  • Output step: The output step defines the ultimate destination for the cleaned and transformed data. This could be a text file (.csv), extract (.hyper or .tde), or published extracted data source to Tableau Server. You'll have options to select the type of output, along with the path and filename or Tableau Server and project.
Right-clicking a step or connector reveals various options. You may also drag and drop steps onto other steps to reveal options such as joining or unioning the steps together. If you want to replace an early part of the flow to swap out an input step, you can right-click the connector and select Remove, and then drag the new input step over the desired next step in the flow to add it as the new input.

In addition to using the term flow to refer to the steps and connections that define the logical flow and transformation of the data, we'll also use the term flow to refer to the file that Tableau Prep uses to store the definition of the steps and changes of a flow. Tableau Prep flow files have the .tfl (unpackaged flow) or .tflx (packaged flow) extension.

The paradigm of Tableau Prep goes far beyond the features and capabilities of any single step. As you build and modify flows, you'll receive instant feedback so that you can see the impact of each step and change. This makes it relatively easy (and fun!) to iteratively discover your data and make the necessary changes.

When you are building flows, adding steps, making changes, and interacting with data, you are in design mode. Tableau Prep uses a combination of the Hyper engine's cache, along with direct queries of the database, to provide near-instant feedback as you make changes. When you run a flow, you are using batch mode. Tableau Prep will run optimized queries and operations that may be slightly different than the queries that are run in design mode.

We'll consider an example in the remainder of this chapter to aid in our discussion of the Tableau Prep paradigm and highlight some important features and considerations. The example will unfold organically, which will allow us to see how Tableau Prep gives you incredible flexibility to address data challenges as they arise and make changes as you discover new aspects of your data.

We'll put you in the role of an analyst at your organization, with the task of analyzing employee air travel. This will include ticket prices, airlines, and even a bit of geospatial analysis of the trips themselves. The data needs to be consolidated from multiple systems and will require some cleaning and shaping to enable the analysis.

Open Tableau Prep Builder and go to the home screen—we'll start by connecting to some data!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.135.107