Tableau Prep is designed with data preparation for analytics in mind. When it comes to reporting and analytics, more data is not always better, especially if you have a particular report in mind that you want to create. Pre-aggregating your data in a data preparation tool such as Tableau Prep instead of your business intelligence tool may result in significant performance gains when it comes to rendering your report.
In this chapter, you'll find recipes to help you prepare your data for analytics. Aggregation is a key part of data preparation. Aggregating your data appropriately in a Tableau Prep workflow can significantly reduce the output size. A smaller dataset will be more performant when connecting any analytics application, including Tableau Desktop.
In this chapter, we'll cover the following recipes:
To follow along with the recipes in this chapter, you will require Tableau Prep Builder and Tableau Desktop. We'll use the sample data supplied in the book's GitHub repository.
One key consideration that is often overlooked is determining the granularity of the data that's needed. For example, when working with geographic data, you may have values for continent, region, country, state, city, ZIP code, street, and so on. But if you're only going to report on country data, you may not need all those other dimensions. Or perhaps you are processing order data; you may want to consider whether you need the details for each individual line item in each individual order – maybe your analysis will be fine with just the total order amount per day. In this recipe, we'll look at a quick method to help reveal the data actually in use in a Tableau Desktop visualization.
To follow along with this recipe, download the Sample Files 4.1 folder from this book's GitHub repository.
Start by opening the Superstore.tflx flow from the Sample Files 4.1 folder in Tableau Prep, then follow these steps:
In this example, we've determined that our data output from Tableau Prep contained many more fields than required to create the visualization. Furthermore, the fields that were required were more granular than required to render the visualization.
In this recipe, we learned how to aggregate data in order to achieve a desired output with the minimum necessary data. Instead of an output with 26 fields and 40,889 rows, the visualization we created in Step 4 could have been created with an output of 2 fields and 3 rows.
What this demonstrates is that sometimes you may benefit from designing backward. First, determine the output, then determine what data and level of granularity is essential to produce that output. In this recipe, we've seen, using a very basic example, how we might want to start optimizing our flow output. As your reporting output becomes more complex, by adding more visuals, dashboards, filters, parameters, and so forth, designing backward becomes more challenging.
However, typically, the more complex and larger the dataset and output, the more benefits you may gain from choosing the right granularity and preparing your data accordingly in Tableau Prep.
In the following recipes in this chapter, we'll see methods for grouping and aggregating data in Tableau Prep.
There are several methods to pre-aggregate your data in your Tableau Prep pipeline. Ideally, your data will be aggregated in your data connection. For example, when connecting to a database, you may be able to write a query that includes a GROUP BY statement so that the data is aggregated before being ingested into Tableau Prep.
Often, such an ideal scenario is not available for a variety of reasons, and sometimes it is simply not possible, for example, when connecting to files such as Excel or CSV files.
In this recipe, we'll look at the preferred methods for most users when aggregating data in Tableau Prep, using the aptly named Aggregate step.
To follow along with this recipe, download the Sample Files 4.2 folder from this book's GitHub repository. In this flow, you'll find a slimmed-down version of the sample Superstore flow provided by Tableau.
The last step in this flow contains more than 20 fields and outputs more than 40,000 rows. However, let's assume we are interested only in the total Sales amount by Segment. In this recipe, we'll achieve that output using the Aggregate step.
Start by opening the Superstore.tflx flow from the Sample Files 4.2 folder in Tableau Prep, then follow these steps:
By default, aggregating a numeric value will be done by summarizing its value across all rows. However, you can select from a range of basic mathematic functions, as well as several statistical functions, to use for aggregation. To do so, click the SUM function on top of the field in the Aggregated Fields section, and select the desired function:
And with that, we've aggregated and grouped our data in just a few clicks. You can add other fields to the Aggregate step by simply dragging them into their respective sections. When doing so, you can always select a different aggregation function for each field. For example, you can summarize Sales but add the average Discount amount.
Aggregating data in Tableau Prep performs the calculations in the data preparation flow rather than outputting all data just for it to be aggregated later in a data visualization tool. It's a process that requires careful thought as you'll want to provide the optimal dataset for the intended downstream analysis: not too broad and not too narrow. In this recipe, you've successfully performed data aggregation. In the process of aggregation, you've created groups. Furthermore, the data preview in Tableau Prep itself might have given you the answer you needed without having to perform additional aggregation steps in a data analysis tool!
Level of Detail or LOD calculations are calculation expressions that have been available in Tableau Desktop for some time. An LOD calculation allows you to aggregate your data at different levels of granularity within a single dataset.
For example, you might have a dataset with customer orders, where each row represents a single line item in an order. You might want to aggregate revenue by order, or by customer, without losing the granularity of your data. This is where LOD calculations come into play. In this recipe, you'll create an LOD calculation. In doing so, you'll group your data into distinct buckets and aggregate values in a single step.
To follow along with this recipe, download the Sample Files 4.3 folder from this book's GitHub repository. You must have Tableau Prep version 2020.1 or greater to leverage the LOD functionality.
Start by opening Tableau Prep and connect to the December 2016 Sales.csv file from the Sample Files 4.3 folder in Tableau Prep, then follow these steps:
This step will cause an immediate error in your flow. That's expected, as we need to take another step to configure the LOD calculation:
Tip
Ensure that whoever is using your output understands the various levels of detail in your output. There are certain calculations that need to be avoided with the LOD field. For example, a graph plotting the Department and Total Transaction Amount by Department fields will have inflated results as it will summarize values that have already been aggregated.
Tableau Prep uses the same calculation expressions as Tableau Desktop. In this recipe, you've created an LOD calculation that resulted in the creation of a group and value aggregation in one powerful move.
In this recipe, Tableau Prep has done the hard work behind the scenes and created the appropriate LOD calculation based on our selections. You can view the calculation by opening the Changes pane:
You're now able to leverage LOD calculations to perform quick data preparation, and even data analysis, in Tableau Prep!
To find out more about LOD calculations, you can download the Tableau Desktop whitepaper Understanding Level of Detail (LOD) Expressions from https://www.tableau.com/learn/whitepapers/understanding-lod-expressions.
Grouping data in Tableau Prep can be done as part of the Aggregate step, as we've seen in the Aggregating values recipe earlier in this chapter. The function we'll review in this recipe is different, in that it can group values from a single field based on certain criteria.
As an example, values in a Name field might include John Smith and Smith, John. These might refer to the same person, and so we can group them together as John Smith. Performing this type of grouping is key to your data preparation efforts and ensures the downstream analysis does not run into issues with seemingly duplicate names.
To follow along with this recipe, download the Sample Files 4.4 folder from this book's GitHub repository.
Start by opening Tableau Prep and connect to the 2016 Sales.csv file from the Sample Files 4.4 folder in Tableau Prep, then follow these steps:
In this recipe, you've manually grouped data. You've also reviewed the different options available to you, including manual grouping, pronunciation, common characters, and spelling grouping options. When you run your flow, Tableau Prep will replace any value in your group with the name of the group itself. Grouping is a great thing to do to improve the quality of your data and make the resulting analysis more robust.
There are three algorithms you can use for the automated grouping of values in Tableau Prep. Here is a summary of each of them. Each of these is available from the Group menu:
With the knowledge gained in this recipe, you're now able to leverage the different types of data grouping options available to you in Tableau Prep.
52.14.1.136