K-means clustering with Shogun

The k-means algorithm in the Shogun library is implemented in the CKMeans class. The constructor of this class takes two parameters: the number of clusters and the object for distance measure calculation. In the following example, we will use the distance object defined with the CEuclideanDistance class. After we construct the object of the CKMeans type, we use the CKMeans::train() method to train our model on our training set, as follows:

 Some<CDenseFeatures<DataType>> features;
int num_clusters = 2;
...
CEuclideanDistance* distance = new CEuclideanDistance(features, features);
CKMeans* clustering = new CKMeans(num_clusters, distance);
clustering->train(features);

When we have trained the k-means object, we can use the CKMeans::apply() method to classify the input dataset. If we use this method without arguments, the training dataset is used for classification. The result of applying classification is a container object with labels. We can cast it to the CMulticlassLabels type for more natural use. The following code sample shows how to classify the input data and also plots the results of clustering:

 Clusters clusters;
auto feature_matrix = features->get_feature_matrix();
CMulticlassLabels* result = clustering->apply()->as<CMulticlassLabels>();
for (index_t i = 0; i < result->get_num_labels(); ++i) {
auto label_idx = result->get_label(i);
auto vector = feature_matrix.get_column(i);
clusters[label_idx].first.push_back(vector[0]);
clusters[label_idx].second.push_back(vector[1]);
}
PlotClusters(clusters, "K-Means", name + "-kmeans.png");

We used the CMulticlassLabels::get_label() method for getting the index of a cluster for a particular sample in our dataset. The CMulticlassLabels::get_label() method takes the sample's index as an argument.

We used resulting cluster indices to visualize the clustering result with the PlotClusters() function, as illustrated in the following screenshot:

In the preceding screenshot, we can see how the k-means algorithm works on different artificial datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.114.85