Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The fourth caveat – k-means is slow for a large number of samples

The final limitation of k-means is that it is relatively slow for large datasets. You can imagine that quite a lot of algorithms might suffer from this problem. However, k-means is affected especially badly: each iteration of k-means must access every single data point in the dataset and compare it to all of the cluster centers.

You might wonder whether the requirement to access all data points during each iteration is really necessary. For example, you might just use a subset of the data to update the cluster centers at each step. Indeed, this is the exact idea that underlies a variation of the algorithm called batch-based k-means. Unfortunately, this algorithm is not implemented in OpenCV.

k-means is provided by scikit-learn as part of their clustering module: sklearn.cluster.MiniBatchKMeans.

Despite the limitations discussed earlier, k-means has some interesting applications, especially in computer vision.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.118.20.68

Table of Contents for The fourth caveat &#x2013;&#xA0;k-means is slow for a large number of samples

Create new playlist

Sign In

Sign Up

Table of Contents for
The fourth caveat – k-means is slow for a large number of samples