An example of clustering using GMM with Spark MLlib

In the previous sections, we saw how to cluster the similar houses together to determine the neighborhood. Using GMM, it is also possible to cluster the houses toward finding the neighborhood except the model training that takes different training parameters as follows:

val K = 5 
val maxIteration = 20 
val model = new GaussianMixture()
                .setK(K)// Number of desired clusters
                .setMaxIterations(maxIteration)//Maximum iterations
                .setConvergenceTol(0.05) // Convergence tolerance. 
                .setSeed(12345) // setting seed to disallow randomness
                .run(landRDD) // fit the model using the training set

You should refer to the previous example and just reuse the previous steps of getting the trained data. Now to evaluate the model's performance, GMM does not provide any performance metrics like WCSS as a cost function. However, GMM provides some performance metrics like mu, sigma, and weight. These parameters signify the maximum likelihood among different clusters (five clusters in our case). This can be demonstrated as follows:

// output parameters of max-likelihood model
for (i <- 0 until model.K) {
  println("Cluster " + i)
  println("Weight=%f
MU=%s
Sigma=
%s
" format(model.weights(i),   
           model.gaussians(i).mu, model.gaussians(i).sigma))
}

You should observe the following output:

Figure 9: Cluster 1

Figure 10: Cluster 2

Figure 11: Cluster 3

Figure 12: Cluster 4

Figure 13: Cluster 5

The weight of clusters 1 to 4 signifies that these clusters are homogeneous and significantly different compared with cluster 5.

Table of Contents for An example of clustering using GMM with Spark MLlib

Create new playlist

Sign In

Sign Up

Table of Contents for
An example of clustering using GMM with Spark MLlib