The process of building a cluster model is quite similar to the process of building a classification model, that is, loading the data and building a model. Clustering algorithms are implemented in the weka.clusterers package, as follows:
import java.io.BufferedReader; import java.io.FileReader; import weka.core.Instances; import weka.clusterers.EM; public class Clustering { public static void main(String args[]) throws Exception{ //load data Instances data = new Instances(new BufferedReader
(new FileReader("data/bank-data.arff"))); // new instance of clusterer EM model = new EM(); // build the clusterer model.buildClusterer(data); System.out.println(model); } }
The model identified the following six clusters:
EM == Number of clusters selected by cross validation: 6 Cluster Attribute 0 1 2 3 4 5 (0.1) (0.13) (0.26) (0.25) (0.12) (0.14) ====================================================================== age 0_34 10.0535 51.8472 122.2815 12.6207 3.1023 1.0948 35_51 38.6282 24.4056 29.6252 89.4447 34.5208 3.3755 52_max 13.4293 6.693 6.3459 50.8984 37.861 81.7724 [total] 62.1111 82.9457 158.2526 152.9638 75.4841 86.2428 sex FEMALE 27.1812 32.2338 77.9304 83.5129 40.3199 44.8218 MALE 33.9299 49.7119 79.3222 68.4509 34.1642 40.421 [total] 61.1111 81.9457 157.2526 151.9638 74.4841 85.2428 region INNER_CITY 26.1651 46.7431 73.874 60.1973 33.3759 34.6445 TOWN 24.6991 13.0716 48.4446 53.1731 21.617 17.9946 ...
The table can be read as follows: the first line indicates six clusters, while the first column shows the attributes and their ranges. For example, the attribute age is split into three ranges: 0-34, 35-51, and 52-max. The columns on the left indicate how many instances fall into the specific range in each cluster; for example, clients in the 0-34 years age group are mostly in cluster 2 (122 instances).