The k-means algorithm is a partition-based clustering algorithm. The centroids of clusters are defined as a representative of each cluster. In k-means clustering, a set of n data points in a D-dimensional space and an integer k are given. The problem is to distribute a set of k points in the centers to minimize the SSE.
The k-medoids algorithm is a partition-based clustering algorithm. The representatives of each resulting clusters are chosen from the dataset itself, that is, the data objects belong to it.
CLARA depends on sampling. It draws a sample from the original dataset instead of the entire dataset. PAM is then applied to each sampling. Then, the best result is kept during all the iterations.
CLARANS is a clustering algorithm based on randomized search.
The affinity propagation clustering algorithm recursively passes affinity messages between objects or points and converges to exemplars adaptively.
Spectral clustering is used to construct graph partitions based on eigenvectors of the adjacency matrix.
Hierarchical clustering decomposes a dataset D into levels of nested clusters; this is represented by a dendrogram, a tree that iteratively splits the dataset D into smaller subsets. The process stops only after each subset consists of only one object.
The next chapter will cover more advanced topics related to clustering algorithms, density-based algorithms, grid-based algorithms, the EM algorithm, high-dimensional algorithms, constraint-based clustering algorithms, and so on.