Hierarchical clustering

Hierarchical clustering avoids the need to specify a target number of clusters because it assumes that data can successively be merged into increasingly dissimilar clusters. It does not pursue a global objective but decides incrementally how to produce a sequence of nested clusters that range from a single cluster to clusters consisting of the individual data points.

There are two approaches:

Agglomerative clustering proceeds bottom-up, sequentially merging two of the remaining groups based on similarity
Divisive clustering works top-down and sequentially splits the remaining clusters to produce the most distinct subgroups

Both groups produce N-1 hierarchical levels and facilitate the selection of a clustering at the level that best partitions data into homogenous groups. We will focus on the more common agglomerative clustering approach.

The agglomerative clustering algorithm departs from the individual data points and computes a similarity matrix containing all mutual distances. It then takes N-1 steps until there are no more distinct clusters, and each time updates the similarity matrix to substitute elements that have been merged by the new cluster so that the matrix progressively shrinks.

While hierarchical clustering does not have hyperparameters like k-Means, the measure of dissimilarity between clusters (as opposed to individual data points) has an important impact on the clustering result. The options differ as follows:

Single-link: the distance between nearest neighbors of two clusters
Complete link: the maximum distance between respective cluster members
Group average: the distance between averages for each group
Ward's method: minimizes within-cluster variance

Table of Contents for Hierarchical clustering

Create new playlist

Sign In

Sign Up

Table of Contents for
Hierarchical clustering