Clustering algorithms

The process of building a cluster model is quite similar to the process of building a classification model, that is, loading the data and building a model. Clustering algorithms are implemented in the weka.clusterers package, as follows:

import java.io.BufferedReader; 
import java.io.FileReader; 
 
import weka.core.Instances; 
import weka.clusterers.EM; 
 
public class Clustering { 
 
  public static void main(String args[]) throws Exception{ 
     
    //load data 
    Instances data = new Instances(new BufferedReader
       (new FileReader("data/bank-data.arff"))); 
     
    // new instance of clusterer 
    EM model = new EM(); 
    // build the clusterer 
    model.buildClusterer(data); 
    System.out.println(model); 
 
  } 
}

The model identified the following six clusters:

    EM
    ==
    
    Number of clusters selected by cross validation: 6
    
                     Cluster
    Attribute              0        1        2        3        4        5
                       (0.1)   (0.13)   (0.26)   (0.25)   (0.12)   (0.14)
    ======================================================================
    age
      0_34            10.0535  51.8472 122.2815  12.6207   3.1023   1.0948
      35_51           38.6282  24.4056  29.6252  89.4447  34.5208   3.3755
      52_max          13.4293    6.693   6.3459  50.8984   37.861  81.7724
      [total]         62.1111  82.9457 158.2526 152.9638  75.4841  86.2428
    sex
      FEMALE          27.1812  32.2338  77.9304  83.5129  40.3199  44.8218
      MALE            33.9299  49.7119  79.3222  68.4509  34.1642   40.421
      [total]         61.1111  81.9457 157.2526 151.9638  74.4841  85.2428
    region
      INNER_CITY      26.1651  46.7431   73.874  60.1973  33.3759  34.6445
      TOWN            24.6991  13.0716  48.4446  53.1731   21.617  17.9946
    ...

The table can be read as follows: the first line indicates six clusters, while the first column shows the attributes and their ranges. For example, the attribute age is split into three ranges: 0-34, 35-51, and 52-max. The columns on the left indicate how many instances fall into the specific range in each cluster; for example, clients in the 0-34 years age group are mostly in cluster 2 (122 instances).

Table of Contents for Clustering algorithms

Create new playlist

Sign In

Sign Up

Table of Contents for
Clustering algorithms