K-means clustering with Dlib

The Dlib library uses kernel functions as the distance functions for the k-means algorithm. An example of such a function is the radial basis function. As an initial step, we define the required types, as follows:

 typedef matrix<double, 2, 1> sample_type;
typedef radial_basis_kernel<sample_type> kernel_type;

Then, we initialize an object of the kkmeans type. Its constructor takes an object that will define cluster centroids as input parameters. We can use an object of the kcentroid type for this purpose. Its constructor takes three parameters: the first one is the object that defines the kernel (distance function), the second is the numerical accuracy for the centroid estimation, and the third one is the upper limit on the runtime complexity (actually, the maximum number of dictionary vectors the kcentroid object is allowed to use), as illustrated in the following code snippet:

 kcentroid<kernel_type> kc(kernel_type(0.1), 0.01, 8);
kkmeans<kernel_type> kmeans(kc);

As a next step, we initialize cluster centers with the pick_initial_centers() function.  This function takes the number of clusters, the output container for center objects, the training data, and the distance function object as parameters, as follows:

 std::vector<sample_type> samples; //training data-set
...
size_t num_clusters = 2;
std::vector<sample_type> initial_centers;
pick_initial_centers(num_clusters, initial_centers, samples,
kmeans.get_kernel());

When initial centers are selected, we can use them for the kkmeans::train() method to determine exact clusters, as follows:

 kmeans.set_number_of_centers(num_clusters);
kmeans.train(samples, initial_centers);

for (size_t i = 0; i != samples.size(); i++) {
auto cluster_idx = kmeans(samples[i]);
...
}

We used the kmeans object as a functor to perform clustering on a single data item. The clustering result will be the cluster's index for the item. Then, we used cluster indices to visualize the final clustering result, as illustrated in the following screenshot:

In the preceding screenshot, we can see how the k-means clustering algorithm implemented in the Dlib library works on different artificial datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.153