Calculations and aggregations in Tableau Prep

Calculations in Tableau Prep follow a syntax that's nearly identical to Tableau Desktop. However, you'll notice that only row-level functions are available. This is because all of the calculations in Tableau Prep are done at a row level. Aggregations are performed using an aggregate step, which we'll consider shortly.

Calculations and aggregations can greatly extend our analytic capabilities. In our current example, there is an opportunity to analyze the length of time between ticket purchase and actual travel. We may also want to mark each record with an indicator of how frequently a passenger travels overall. Let's dive into these calculations as we continue our example with the following steps:

  1. We'll start with a calculation to determine the length of time between purchase of tickets and the day of travel. Select the Clean 3 step and then click Create Calculated Field. Name the calculation Days from Purchase to Travel and enter
    DATEDIFF('day', [Purchase Date], [Travel Date]).
  2. Examine the results in the Profile pane. The new field should look like this:

The default view here (as in many cases with numeric fields) is a summary binned histogram. You can change the view to see its details by selecting the Options button in the upper right of the field and switching to Detail, which will show every value of the field:

The shape of the data that's indicated by the default Summary histogram is close to what we might have expected with most people purchasing tickets closer (but not immediately before) to the date of travel. There might be some opportunity for getting better deals by purchasing more in advance, so identifying this pattern (and then exploring it more fully in Tableau Desktop) will be key to our analysis.

We may also want to be able to group passengers based on how frequently they travel. We'll use some aggregations and calculations to accomplish this.

  1. Click the + icon that appears when you hover over the Clean 3 step and select Add Aggregate. A new step named Aggregate 1 will be added to the flow.
  2. Double-click the text Aggregate 1 under the new step. This allows you to edit the name. Change the name from Aggregate 1 to Trips per person.
Give steps meaningful names to self-document the flow. This will greatly help you and others when you return to edit the flow in the future. Additionally, when you are editing the name of a step, the Add a description text will appear below the name.

Referring to the preceding TIP you can double-click this area to add a longer description to provide further documentation:

Selecting the aggregate step reveals a pane with options for grouping and aggregating fields in the flow:

You may drag and drop fields from the left to the Grouped Fields or Aggregated Fields sections and you may change the type of aggregation by clicking on the aggregation text (for example, SUM) and selecting a different aggregation from the resulting dropdown.

In the preceding screenshot, you can see that we've grouped by Person and added Number of Rows to the Aggregated Fields as a SUM. Number of Rows is a special field that's available in the Aggregation step. We are using it here because every record indicates an individual trip, so counting the records per person lets us know how many trips they've made. We'll use that information as we continue our example.

  1. Click the + icon that appears when you hover over the Aggregate step Trips per person and select Add Step. A new step named Clean 4 to the flow.
  2. Select this new Clean step and in the Profile pane, double-click the name of the Number of Rows field to rename it to Trips.
  3. Create a new calculated field named Type of Traveler with the following code:
IF [Trips] > 15 THEN "Extreme" 
ELSEIF [Trips] > 10 THEN "Frequent" 
ELSEIF [Trips] > 4 THEN "Casual" 
ELSE "Infrequent" 
END 

This calculated field will now label each person as either an Extreme, Frequent, Casual, or Infrequent traveler, based on how many aggregate trips they've made. At this point, you may notice that the flow only contains fields from the Aggregate step (plus the calculated field we just created):

This is to be expected as Aggregate steps only retain either grouped or aggregated fields. At times, the results of aggregation are exactly what you want for analysis.

However, in this case, we wish to supplement the original dataset so that we can label each person as a certain type of traveler. We'll accomplish that next.

  1. Drag the Clean 4 step and hover over the Clean 3 step.
  2. Drop the Clean 4 step onto the Join option that appears. The settings pane for a Join step will appear, allowing you to configure the join if needed. Note that Tableau Prep has automatically selected the most likely field for the join based on name and type:

Take a moment to examine the unique features of the Join pane:

  • Applied Join Clauses: Here, you have the option to add conditions to the Join clause and decide which fields should be used as keys to define the join. You may add as many clauses as you need.
  • Join Type: Here, you may define the type of join (inner, left, left inner, right, right inner, or outer). Accomplish this by clicking sections of the Venn diagram to select or deselect the parts of the join you wish to retain.
  • Summary of Join Results: The bar chart here indicates how many records come from each input of the flow and how many matched or did not match and whether they will be kept in the resulting dataset based on the type of join that's been selected. You may click a bar segment to see the filtered results in the data grid.
  • Join Clause Recommendations: If applicable, Tableau Prep will display probable Join Clauses which you can then add with a single click.
  • Join Clause: Here, Tableau Prep displays the fields that are used in the Join Clauses and the corresponding values. Any unmatched values will have a red font color. You may edit values by double-clicking them. This enables you to fix individual mismatched values as needed.

In our example, Tableau Prep automatically determined Person as a matching field between both join inputs and all values to find a match, as expected. No further action is required for this example, but in some cases you may need to adjust the Join Clauses or fix mismatched values.

Watch out for duplicate records that can be created as a result of an incorrect join. For example, you might expect a left join to retain all of the records on the left side and only match some of the records on the right. While that is true, records from the left side can be duplicated if they match more than one record on the right. Use the Summary of Join Results to watch out for unexpected results. Be aware that data sampling may also greatly impact the summary numbers.

Extend the flow by adding another clean step, which should be named Clean 5 automatically. At this point, the flow should look similar to this:

The results of joining an aggregation back to the original dataset is very similar to creating a FIXED level of detail expression in Tableau Desktop. For example, in Tableau Desktop, we might have written an expression such as the following:

{FIXED [Person]: SUM(1)}

In desktop, this expression gives the number of records at the level of Person and returns that value to every row. While LOD calculations cannot be written in the same way in Tableau Prep, we accomplished the same thing by aggregating and then joining back into the flow. We'll conclude our example by considering filtering and then reshaping the data for a very specific visual analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.243.83