Challenges in CC algorithm

As discussed previously, in a centroid-based clustering algorithm like K-means, setting the optimal value of the number of clusters K is an optimization problem. This problem can be described as NP-hard (that is non-deterministic polynomial-time hard) featuring high algorithmic complexities, and thus the common approach is trying to achieve only an approximate solution. Consequently, solving these optimization problems imposes an extra burden and consequently nontrivial drawbacks. Furthermore, the K-means algorithm expects that each cluster has approximately similar size. In other words, data points in each cluster have to be uniform to get better clustering performance.

Another major drawback of this algorithm is that this algorithm tries to optimize the cluster centers but not cluster borders, and this often tends to inappropriately cut the borders in between the clusters. However, sometimes, we can have the advantage of visual inspection, which is often not available for data on hyperplanes or multidimensional data. Nonetheless, a complete section on how to find the optimal value of K will be discussed later in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.146.199