k-means clustering

The k-means clustering, as the name suggests, is a technique to cluster data, that is, partition data into a specified number of data points. It is an unsupervised learning technique. It works by identifying patterns in the given data. Remember the sorting hat from Harry Potter fame? What it is doing in the book is clustering--dividing new (unlabeled) students into four different clusters: Gryffindor, Ravenclaw, Hufflepuff, and Slytherin.

Humans are very good at grouping objects together; clustering algorithms try to give a similar capability to computers. There are many clustering techniques available such as Hierarchical, Bayesian, or Partional. The k-means clustering belongs to Partional clustering; it partitions the data into k clusters. Each cluster has a center, called centroid. The number of clusters k has to be specified by the user.

The k-means algorithm works in the following manner:

Randomly choose k data points as the initial centroids (cluster centers)
Assign each data point to the closest centroid; there can be different measures to find closeness, the most common being the Euclidean distance
Recompute the centroids using current cluster membership, such that the sum of squared distance decreases
Repeat the last two steps until convergence is met

Table of Contents for k-means clustering

Create new playlist

Sign In

Sign Up

Table of Contents for
k-means clustering