Grouping and cleaning

Now, we'll spend some time cleaning up the data that came from both input sources. With the Clean 3 step selected, use the Profile pane to examine the data and continue our flow. The first two fields indicate some issues that need to be addressed:

The Table Names field was generated by Tableau Prep as part of Union 1 to indicate the source of the records. The Airline field came only from the Excel files (you can confirm this by selecting it in the profile pane and observing the highlighted path of the field in the flow pane). Click the null value for Airline and observe the brushing: this is proof that the NULL values in Airline all come from the Southwest files since those files did not contain a field to indicate the airline. We'll address the NULL values and do some additional cleanup:

  1. Double-click the null value and then type Southwest to replace NULL with the value you know represents the correct airline. Tableau Prep will indicate that a Group and Replace operation has occurred with a paperclip icon.
  1. We'll do an additional grouping to clean up the variations of American. Using the Options button on the Airline field, select Group and Replace | Pronunciation:

Nearly all of the variations are grouped into the American value. Only AA remains.

  1. In the Group and Replace pane that has appeared, Ctrl + click AA and American, and then select Group Values:

  1. Next, select the Table Names field, which is no longer needed. Using either the toolbar option, the menu from a right-click for the field, or the options button, select Remove Field.
  2. Some fields in the Profile pane have a Recommendations icon in the upper right corner. Click this for Passenger Email and then Apply the recommendation to assign a data role of email:

Data Roles allow you to quickly identify valid or invalid values according to what pattern or domain of values is expected. Once you have assigned a data role, you may receive additional recommendations to either filter or replace invalid values.
  1. Click the Recommendations button again and Apply the option to Group and Replace invalid values with NULL.
  2. Most of the remaining fields look fine, with the exception of Fare Type, which contains the values 1st Class and First Class. Select both of these values and then group them together with the First Class value.
  3. At this point, we have a clean dataset that contains all of our primary data. There's already a lot of analysis we could do. In fact, let's take a moment to preview the data. Right-click the Clean 3 step and select Preview in Tableau Desktop:

A new data connection will be made and opened in Tableau Desktop. You can preview the data for any step in the flow. Take a few moments to explore the data in Tableau Desktop and then return to Tableau Prep. Now, we'll turn our attention to extending the dataset with some calculations, supplemental data, and a little restructuring.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.199.138