The CLARA algorithm

Instead of taking the whole set of data into consideration, the CLARA (Clustering LARge Application) algorithm randomly chooses a small portion of the actual data as a representative of the data. Medoids are then chosen from this sample using PAM. If the sample is selected in a fairly random manner, it should closely represent the original dataset.

CLARA draws multiple samples of the dataset, applies PAM to each sample, finds the medoids, and then returns its best clustering as the output. At first, a sample dataset D' is drawn from the original dataset D, and the PAM algorithm is applied to D' to find the k medoids. Use these k medoids and the dataset D to calculate the current dissimilarity. If it is smaller than the one you get in the previous iteration, then these k medoids are kept as the best k medoids. The whole process is performed a specified number of times.

The CLARA algorithm

The summarized pseudocode for the CLARA algorithm is as follows:

The CLARA algorithm

The R implementation

Take a look at the ch_05_clara.R R code file from the bundle of R code for the previously mentioned algorithms. The codes can be tested with the following command:

> source("ch_05_clara.R")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.146.173