All of Tableau Prep Builder's functionality is designed around the purpose of combining, shaping, and cleaning your data for downstream analysis. In this chapter, we'll look at the various transformations available to you to shape and clean your data.
In this chapter, you'll find the following recipes, which will help you transform your data:
To follow along with the recipes in this chapter, you will require Tableau Prep Builder. We'll use sample Excel files supplied in the book's GitHub repository, so there's no need to connect to a database. In each recipe, however, you can replace the suggested sample input data source with any connection type that suits your scenario.
The recipes in this chapter use sample data files that you can download from the book's GitHub repository: https://github.com/PacktPublishing/Tableau-Prep-Cookbook.
When it comes to cleaning data, one of the simplest yet most powerful actions might be simply renaming your fields to a more user-friendly format.
Tableau Prep steps can be categorized into three items: inputs, transformations, and outputs. During the first two, the input and transformation steps, we'll always have the ability to change any field name as desired.
To follow along with this recipe, download the Sample Files 3.1 folder from this book's GitHub repository.
Open up Tableau Prep Builder and connect to the December 2016 Sales.xlsx file:
When you select the Sales_Data input step, the bottom pane will show a preview of all the fields in the source data, along with their data type and some sample values:
It's easy to remember the changes we've made just now, but as your flow grows in complexity, it can become challenging to recall all the changes we've made. Tableau Prep supports us here by showing an edit icon next to the step in the flow that we've altered. In this case, this is the input Sales_Data step. We can hover our mouse over the edit icon to see a summary of the edits we made:
Of course, you may wish to get more detailed insights into the exact change. There's a great overview of that as well! We can click the step with the changes and, from the bottom pane, select the Changes tab. This tab provides detailed insight into the changes made, and we can edit or remove any of the changes as needed from here:
You've now successfully prepared this dataset for use by providing user-friendly, descriptive names to all fields.
As we've seen in the steps in the How to do it… section, changing a field name in Tableau Prep is very easy. However, I recommend you take caution when renaming fields in your workflow at random steps. Doing so makes it more difficult for someone else to quickly understand your flow.
To avoid this confusion as best as possible, always rename fields as far upstream as possible, preferably in the input step itself. That way, your flow will stay organized and is easier to understand for others.
When preparing your data for analysis, it's good practice to provide the least amount of data required to perform that analysis. Our data inputs frequently contain data that is not required and that you may want to remove immediately during an input step.
Alternatively, you may transform your data in a Tableau Prep flow, and as a result, a field may become redundant at some point after the input step itself.
In Tableau Prep, there are three methods you can use to filter your data. In this recipe, we'll perform filter actions using all three methods: calculation filters, selected values filters, and regular filters.
To follow along with this recipe, download the Sample Files 3.2 folder from this book's GitHub repository.
Open up Tableau Prep Builder and connect to the Superstore Sales.hyper extract file, then follow the steps:
IF [Segment] = "Consumer" THEN TRUE ELSE FALSE END
Important note
Calculations are a powerful feature in Tableau Prep. You can find more information on calculations in Chapter 7, Creating Powerful Calculations.
As seen in the following screenshot, you can type in your code entirely, or use the reference section to look up functions and use the examples to get started:
In this recipe, we learned how to leverage data filters in Tableau Prep. Filters are the primary method for cleaning your dataset to remove redundant data. Besides benefiting the output for analysis purposes, a leaner dataset can also speed up the execution time of your flow. For that reason, exclude unnecessary data as early as possible in your flow.
With Tableau Prep's ability to connect to an incredible number of different data sources comes the challenge of data type management. Every data source technology handles data types slightly differently, or stores values differently. The wrong data type may limit the number of functions you can perform with that field. For example, you cannot aggregate a number if its data type is text, nor can you filter for a date range if the data type is not a date.
Tableau Prep does a phenomenal job of automatically detecting the appropriate data type. Tableau Prep data types are listed as follows:
There are times when Tableau is unable to determine the correct data type, and times when it is unable to set it to your desired type because the values in the data are not compatible. We'll look at both cases in this recipe and how to address them.
To follow along with this recipe, download the Sample Files 3.3 folder from this book's GitHub repository.
Open up Tableau Prep and connect to the December Sales 2016.xlsx file from the Sample Files 3.3 folder and follow the steps:
The change is successful this time and the pane shows proper date formats for the values now:
Tip
Always use the data cleaning tool to change data types. Not only does the result sometimes differ compared to using the input tool but you also have the added benefit of seeing more rows in the data grid, so you can instantly view the result of your changes.
We can see that our new field, Date Fix, has been added and Tableau Prep has automatically set the data type to Date:
Important note
Find out more about calculated fields in Chapter 7, Creating Powerful Calculations.
In this recipe, we learned how to improve the quality of our data by correcting data types. Tableau auto-detects a variety of data types for your inputs. As we've seen in this recipe, it also auto-detects the type for newly added fields that are the result of a calculation. If you ever run into issues with the automatic detections, your first port of call is simply selecting the desired type from the data type dropdown. However, advanced calculations are always available for more complex scenarios.
Data validation can be a time-consuming task where we have to determine whether a value is accurate or not. One of the most typical data validation issues relates to misspelling and labeling the same thing differently. For example, the city of New York might be present in your data more than once, with different labels:
To make the process of validating data easier, Tableau Prep uses data roles. A data role compares your data against a list of known values or specific patterns. This allows us to quickly identify problematic values in our data and take action to resolve them.
To follow along with this recipe, download the Sample Files 3.4 folder from this book's GitHub repository.
Open up Tableau Prep and connect to the User List.csv file from the Sample Files 3.4 folder and follow the steps:
With the role selected, Tableau Prep will validate the data in our sample file and flag any mismatch to the role with an exclamation symbol:
We have several options to resolve these issues, which you can reveal by right-clicking an item. Let's select Replace with Null for gator:
Once you edit a value, you'll notice a little paperclip icon in the list of values and a change in the Changes section listed as Group Values. This is because Tableau has created a new group with the null value and added gator as a member of that group.
Important note
More details about the grouping functionality are provided in Chapter 4, Data Aggregation.
With the group created, we can now only see the New York value, and the warning has disappeared. Note that the Changes pane now displays another grouping action, specific to the City field:
You've now successfully applied auto-validation methods to your data.
In this recipe, we learned how to apply auto-validation to ensure the right information is present in our dataset. Tableau Prep relies on data roles to validate your data. The built-in rules allow quick validation of common fields. For more powerful custom validation data roles, refer to the next recipe in this chapter.
Tableau Prep provides out-of-the-box data validation data roles for email addresses, URLs, and a variety of geographic fields. However, the real power of data roles comes from creating custom data roles specific to your environment and data. In this recipe, we'll create a custom data role and publish it to Tableau Server or Tableau Online. By leveraging your server, the custom data role can be made available to your colleagues, ensuring everyone is using a single reference list.
To follow along with this recipe, download the Sample Files 3.5 folder from this book's GitHub repository. In this recipe, we'll create a custom data role, for which you will need to be signed in to your instance of Tableau Server or Tableau Online.
Open up Tableau Prep and connect to the User List.csv file from the Sample Files 3.5 folder and follow the steps:
In the bottom pane, from the profile section, we can see that our data contains four fields: Email, Profile URL, City, and Type. We can use Tableau Prep's built-in data roles to validate values in the first three fields, as seen in the previous recipe, Auto-validating data. However, the Type field requires a custom role for validation purposes.
When done, your output should look as in the following screenshot:
You've now completed the steps required to apply a custom data role using Tableau Server.
In this recipe, we learned how to combine the power of Tableau Server with Tableau Prep and create a reusable, custom data role. While built-in rules allow quick validation on common fields, the ability to create roles brings the power of reusable custom roles to your organization. You can create any number of custom roles and share access through Tableau Server so that you and your team can all leverage the same data roles.
It's not uncommon for a single field to contain multiple values. For example, a name field may contain both a user's first and last name. Separating these can be done with Tableau Prep's Split Values function. Split Values facilitates the automatic creation of calculated fields with the necessary logic to split up a field based on your requirements.
To follow along with this recipe, download the Sample Files 3.6 folder from this book's GitHub repository.
Open up Tableau Prep and connect to the December 2016 Sales.xlsx file from the Sample Files 3.6 folder and follow the steps:
In the profile pane, we can see that the field named Cashier contains an employee's name. We want to split that value into two separate fields, first name and last name:
When selecting Automatic Split, Tableau Prep will attempt to identify the delimiter in the field automatically and subsequently create new fields as a result. If Tableau is unable to determine the delimiter, Automatic Split will display the Custom Split dialog instead.
The result of the action we just took can be seen in the Changes pane, where we can see that two new calculated fields have been created. We can also see two new fields in the profile pane: Cashier – Split 1 and Cashier – Split 2.
Note that the splitting of a field does not amend or remove the original field. In this instance, we still have the original Cashier field present in our data:
You've now successfully cleaned up this dataset and your Changes pane should look like the following:
In this recipe, we learned how to split values contained in a single field into multiple fields. Whenever you create a split in Tableau Prep, whether you are using the Automatic Split or Custom Split option, Tableau Prep will create a calculated field to determine the values in the new field. You can always edit these calculations just like any other calculated field by selecting the Edit icon in the Changes pane.
The following screenshot shows the split function in the calculated fields dialog:
With this knowledge, you can now manage complex datasets that store multiple values in a single field and transform them into an organized, easy-to-understand dataset.
18.224.32.86