Clustering algorithms

The process of building a cluster model is quite similar to the process of building a classification model, that is, loading the data and building a model. Clustering algorithms are implemented in the weka.clusterers package, as follows:

import java.io.BufferedReader; 
import java.io.FileReader; 
 
import weka.core.Instances; 
import weka.clusterers.EM; 
 
public class Clustering { 
 
  public static void main(String args[]) throws Exception{ 
     
    //load data 
    Instances data = new Instances(new BufferedReader
(new FileReader("data/bank-data.arff"))); // new instance of clusterer EM model = new EM(); // build the clusterer model.buildClusterer(data); System.out.println(model); } }

The model identified the following six clusters:

    EM
    ==
    
    Number of clusters selected by cross validation: 6
    
                     Cluster
    Attribute              0        1        2        3        4        5
                       (0.1)   (0.13)   (0.26)   (0.25)   (0.12)   (0.14)
    ======================================================================
    age
      0_34            10.0535  51.8472 122.2815  12.6207   3.1023   1.0948
      35_51           38.6282  24.4056  29.6252  89.4447  34.5208   3.3755
      52_max          13.4293    6.693   6.3459  50.8984   37.861  81.7724
      [total]         62.1111  82.9457 158.2526 152.9638  75.4841  86.2428
    sex
      FEMALE          27.1812  32.2338  77.9304  83.5129  40.3199  44.8218
      MALE            33.9299  49.7119  79.3222  68.4509  34.1642   40.421
      [total]         61.1111  81.9457 157.2526 151.9638  74.4841  85.2428
    region
      INNER_CITY      26.1651  46.7431   73.874  60.1973  33.3759  34.6445
      TOWN            24.6991  13.0716  48.4446  53.1731   21.617  17.9946
    ...
  

The table can be read as follows: the first line indicates six clusters, while the first column shows the attributes and their ranges. For example, the attribute age is split into three ranges: 0-34, 35-51, and 52-max. The columns on the left indicate how many instances fall into the specific range in each cluster; for example, clients in the 0-34 years age group are mostly in cluster 2 (122 instances).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.109.34