There are two approaches to hierarchical clustering:
- In agglomerative hierarchical clustering, we start with each data point potentially being its own cluster, and we subsequently merge the closest pair of clusters until only one cluster remains.
- In divisive hierarchical clustering, it's the other way around; we start by assigning all of the data points to the same cluster, and we subsequently split the cluster into smaller clusters until each cluster only contains one sample.
Of course, we can specify the number of desired clusters if we wish to. In the following screenshot, we asked the algorithm to find a total of three clusters:
The preceding screenshot shows a step-by-step example of agglomerative hierarchical clustering.
In step 1, the algorithm put the two closest points into their own cluster (middle and top).
In step 2, two points on the lower-right turned out to be the closest pair across all possible pairs in the dataset, so they were merged into their own cluster.
This process continued until all data points were assigned to one of the three clusters (step 7), at which point the algorithm terminated.
Let's dig deeper into implementing the agglomerative hierarchical clustering.