The fourth caveat – k-means is slow for a large number of samples

The final limitation of k-means is that it is relatively slow for large datasets. You can imagine that quite a lot of algorithms might suffer from this problem. However, k-means is affected especially badly: each iteration of k-means must access every single data point in the dataset and compare it to all of the cluster centers.

You might wonder whether the requirement to access all data points during each iteration is really necessary. For example, you might just use a subset of the data to update the cluster centers at each step. Indeed, this is the exact idea that underlies a variation of the algorithm called batch-based k-means. Unfortunately, this algorithm is not implemented in OpenCV.

k-means is provided by scikit-learn as part of their clustering module: sklearn.cluster.MiniBatchKMeans.

Despite the limitations discussed earlier, k-means has some interesting applications, especially in computer vision.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.68