The steps of the k-means algorithm

The steps involved in the k-means clustering algorithm are as follows:

Step 1	We choose the number of clusters, k.
Step 2	Among the data points, we randomly choose k points as cluster centers.
Step 3	Based on the selected distance measure, we iteratively compute the distance from each point in the problem space to each of the k cluster centers. Based on the size of the dataset, this may be a time-consuming step—for example, if there are 10,000 points in the cluster and k = 3, this means that 30,000 distances need to be calculated.
Step 4	We assign each data point in the problem space to the nearest cluster center.
Step 5	Now each data point in our problem space has an assigned cluster center. But we are not done, as the selection of the initial cluster centers was based on random selection. We need to verify that the current randomly selected cluster centers are actually the center of gravity of each cluster. We recalculate the cluster centers by computing the mean of the constituent data points of each of the k clusters. This step explains why this algorithm is called k-means.
Step 6	If the cluster centers have shifted in step 5, this means that we need to recompute the cluster assignment for each data point. For this, we will go back to step 3 to repeat that compute-intensive step. If the cluster centers have not shifted or if our predetermined stop condition (for example, the number of maximum iterations) has been satisfied, then we are done.

The following figure shows the result of running the k-means algorithm in a two-dimensional problem space:

(a) Data points before clustering; (b) Resultant clusters after running the k-means clustering algorithm

Note that the two resulting clusters created after running k-means are well differentiated in this case.

Table of Contents for The steps of the k-means algorithm