Chinese Whispers – graph clustering algorithm with Dlib

The Chinese Whispers algorithm is an algorithm to partition the nodes of weighted, undirected graphs. It was described in the paper Chinese Whispers—an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems by Chris Biemann. This algorithm also does not use any unique graph theory methods but it uses the idea of using local contexts for clustering, so it can be classified as a density-based method.

In the Dlib library, this algorithm is implemented in the chinese_whispers() function, which takes the vector of weighted graph edges and outputs the container with cluster indices for each of the vertices. For the performance consideration, we limit the number of edges between dataset objects with a threshold on distance. Moreover, as with the Newman algorithm, this one also determines the number of resulting clusters by itself. The code can be seen in the following snippet:

 std::vector<sample_pair> edges;
for (long i = 0; i < inputs.nr(); ++i) {
for (long j = 0; j < inputs.nr(); ++j) {
auto dist = length(subm(inputs, i, 0, 1, 2) - subm(inputs, j,
0, 1, 2));
if (dist < 1)
edges.push_back(sample_pair(i, j, dist));
}
}
std::vector<unsigned long> clusters;
const auto num_clusters = chinese_whispers(edges, clusters);

The chinese_whispers() function call filled the clusters object with cluster index values, which we can use to visualize the clustering result. Notice that we used 1 as the threshold for edge weights, and another threshold value can lead to another clustering result. Also, edge weight values should be initialized according to a certain task. The edge length was chosen only for demonstration purposes.

The result can be seen in the following screenshot:

In the preceding screenshot, we can see how the Chinese Whispers clustering algorithm implemented in the Dlib library works on different artificial datasets.

In the current and previous sections, we saw a lot of examples of images that show clustering results. The following section will show details of using the plotcpp library, which we used to plot these images.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.31.165