Borrowing from the Wikipedia definition:
"cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters)."
In this recipe, we will try out the new clustering feature introduced in Tableau 10 to group athletes based on endorsements and salary/winnings:
To follow this recipe, open B05527_06 – STARTER.twbx. Use the worksheet called Cluster and connect to the Top Athlete Salaries (Global Sport Finances) data source:
Here are the steps to create the scatter plot with clusters:
3
:Tableau 10 makes it easier to do cluster analysis. Under the Analytics pane in the side bar, there is a new option to add clusters to your view. You simply drag Cluster onto your view, and the clusters will be automatically computed. You will also be presented with a window that asks you which variables need to be considered in the cluster, and how many clusters need to be created:
If you need to consider additional measures and/or dimensions when computing the clusters, you can drag them onto the variables pane.
By default, the cluster is placed on Color in the Marks card. When you right-click on this pill, you will see new options to edit or describe the cluster:
When you choose Describe clusters..., you can show the Summary values computed for the cluster:
You can also show the model used for the analysis of variance for the cluster:
Clusters can also be saved back to your Tableau data source. When you drag the cluster pill from the Color property in the Marks card back to the side bar, the cluster gets saved in the form of a group:
If you would like to take advantage of existing software applications such as R, SPSS, or SAS for your clustering calculations, you can connect to these data sources in Tableau. Tableau also supports R integration, which allows you to write R code from within Tableau.
18.223.108.119