Clustering

The unsupervised equivalent of classification is termed as clustering. These algorithms help us cluster or group data points into different groups or categories, without the availability of any output label in the input/training dataset. These algorithms try to find patterns and relationships from the input dataset, utilizing inherent features to group them into various groups based on some similarity measure, as shown in the following diagram:

Unsupervised learning: Clustering news articles

A real-world example to help understand clustering could be news articles. There are hundreds of news articles written daily, each catering to different topics ranging from politics and sports to entertainment, and so on. An unsupervised approach to group these articles together can be achieved using clustering, as shown in the preceding figure.

There are different approaches to perform the process of clustering. The most popular ones are:

Centroid based methods. Popular ones are K-means and K-medoids.
Agglomerative and divisive hierarchical clustering methods. Popular ones are Ward's and affinity propagation.
Data distribution based methods, for instance, Gaussian mixture models.
Density based methods such as DBSCAN and so on.

Table of Contents for Clustering

Create new playlist

Sign In

Sign Up

Table of Contents for
Clustering