Join risks

The main risk with Joins is data duplication. Rather than a theoretical explanation, consider the following example.

If you want to reproduce the example, you can download the file Data duplication example file from my website, book.ladataviz.com in, the Chapter 4: Connect To Data and Simple Transformation section, or use this direct link: https://ladataviz.com/wp-content/uploads/2018/09/Data-duplication-example.xlsx.

The Data duplication example Excel file contains two sheets: Sales and Product. Sales contains the following data:

The total volume of sales is 300.

Product contains the following data:

Now let's join the two tables and see what happens. Here's the result in Tableau:

As you can see, the product number 3 is duplicated. The reason for that duplication is because there are two product names with the same Product ID. The volume of sales here is 400, which is wrong.

When you Join two or more tables, always be sure that the values you want to analyze won't be duplicated. There are three solutions to deal with data duplication:

  • The easiest: Clean the file to remove the duplication. In the example, it means changing the ID of a product to 4.
  • Use data blending: Explained in Chapter 12, Advanced Data Connection.
  • Use level-of-detail calculation functions: Explained in Chapter 10, An Introduction to Calculations.

To finish this section, let's create a Join together, step by step.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.181.47