Evaluating the clusters

The objective of good quality clustering is that the data points that belong to the separate clusters should be differentiable. This implies the following:

  • The data points that belong to the same cluster should be as similar as possible.
  • Data points that belong to separate clusters should be as different as possible.

Human intuition can be used to evaluate the clustering results by visualizing the clusters, but there are mathematical methods that can quantify the quality of the clusters. Silhouette analysis is one such technique that compares the tightness and separation in the clusters created by the k-means algorithm. The silhouette draws a plot that displays the closeness each point in a particular cluster has with respect to the other points in the neighboring clusters. It associates a number in the range of [-0, 1] with each cluster. The following table shows what the figures in this range signify:

Range Meaning Description
0.71–1.0 Excellent This means that the k-means clustering resulted in groups that are quite differentiable from each other.
0.51–0.70 Reasonable This means that the k-means clustering resulted in groups that are somewhat differentiable from each other.
0.26–0.50 Weak This means that the k-means clustering resulted in grouping, but the quality of the grouping should not be relied upon.
<0.25 No clustering has been found Using the parameters selected and the data used, it was not possible to create grouping using k-means clustering.

 

Note that each cluster in the problem space will get a separate score.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.196.217