The main risk with Joins is data duplication. Rather than a theoretical explanation, consider the following example.
The Data duplication example Excel file contains two sheets: Sales and Product. Sales contains the following data:
The total volume of sales is 300.
Product contains the following data:
Now let's join the two tables and see what happens. Here's the result in Tableau:
As you can see, the product number 3 is duplicated. The reason for that duplication is because there are two product names with the same Product ID. The volume of sales here is 400, which is wrong.
When you Join two or more tables, always be sure that the values you want to analyze won't be duplicated. There are three solutions to deal with data duplication:
- The easiest: Clean the file to remove the duplication. In the example, it means changing the ID of a product to 4.
- Use data blending: Explained in Chapter 12, Advanced Data Connection.
- Use level-of-detail calculation functions: Explained in Chapter 10, An Introduction to Calculations.
To finish this section, let's create a Join together, step by step.