Training a classifier (Simple)

The most exciting task in data mining is classifier training. This recipe will show you how to train a classifier and set the options.

Getting ready

Load a dataset as described in the Loading the data (Simple) recipe.

How to do it...

To train a classifier, use the following snippet:

import weka.classifiers.trees.J48;

String[] options = new String[1];
options[0] = "-U";
J48 tree = new J48();
tree.setOptions(options);
tree.buildClassifier(data);

The classifier is now trained and ready for classification.

How it works...

The classifier implements the OptionHandler interface, which allows you to set the options via a String array. First, import a classifier from the weka.classifiers package, for example, a J48 decision tree:

import weka.classifiers.trees.J48;

Then, prepare the options in a String array, for example, set the tree to be unpruned:

String[] options = new String[1];
options[0] = "-U";            // un-pruned tree

Now, initialize the classifier:

J48 tree = new J48();         // new instance of tree

Set the options with the OptionHandler interface:

tree.setOptions(options);     // set the options

And build the classifier:

tree.buildClassifier(data);   // build classifier

Now, you are ready to validate it and use it (see recipes Test and Evaluate).

There's more...

There is a wide variety of implemented classifiers in Weka. This section first demonstrates how to build a support vector machine classifier, and then it lists some other popular classifiers, and finally it shows how to create a classifier that is able to incrementally accept data.

Support vector machine

Another popular classifier is support vector machine. To train one, follow the previous recipe but instead of J48, import the SMO class from weka.classifiers.functions:

import weka.classifiers.functions.SMO;

Then, initialize a new object and build the classifier:

SMO svm = new SMO();
svm.buildClassifier(data);

Other classification models

In addition to decision trees (weka.classifiers.trees.J48) and support vector machines (weka.classifiers.functions.SMO), we have listed some of the many other classification algorithms one can use in Weka:

  • weka.classifiers.rules.ZeroR: Predicts the majority class (mode) and it is considered as a baseline; that is, if your classifier's performance is worse than the average value predictor, it is not worth considering it.
  • weka.classifiers.trees.RandomTree: Constructs a tree that considers K randomly chosen attributes at each node.
  • weka.classifiers.trees.RandomForest: Constructs a set (that is, forest) of random trees and uses majority voting to classify a new instance.
  • weka.classifiers.lazy.IBk: K-nearest neighbors classifier that is able to select appropriate value of neighbors based on cross-validation.
  • weka.classifiers.functions.MultilayerPerceptron: A classifier based on neural networks that uses back-propagation to classify instances. The network can be built by hand, or created by an algorithm, or both.
  • weka.classifiers.bayes.NaiveBayes: A naive Bayes classifier that uses estimator classes, where numeric estimator precision values are chosen based on analysis of the training data.
  • weka.classifiers.meta.AdaBoostM1: The class for boosting a nominal class classifier using the Adaboost M1 method. Only nominal class problems can be tackled. Often dramatically improves performance, but sometimes overfits.
  • weka.classifiers.meta.Bagging: The class for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner.

Incremental classifiers

When a dataset is really big or you have to deal with real-time, stream data, then the preceding methods won't fit into memory all at once. Some classifiers implement the weka.classifiers.UpdateableClassifier interface, which means they can be trained incrementally. These are AODE, IB1, IBk, KStar, LWL, NaiveBayesUpdateable, NNge, RacedIncrementalLogitBoost, and Winnow.

The process of training an incremental classifier with the UpdatableClassifier interface is fairly simple.

Open a dataset with the ArffLoader class:

ArffLoader loader = new ArffLoader();
loader.setFile(new File("/some/where/data.arff"));

Load the structure of the dataset (does not contain any actual data rows):

Instances data = loader.getStructure();
data.setClassIndex(structure.numAttributes() - 1);

Initialize a classifier and call buildClassifier with the structure of the dataset:

NaiveBayesUpdateable nb = new NaiveBayesUpdateable();
nb.buildClassifier(structure);

Subsequently, call the updateClassifier method to feed the classifier new weka.core.Instance objects, one by one:

Instance current;
while ((current = loader.getNextInstance(structure)) != null)
   nb.updateClassifier(current);

After each update, the classifier takes into account the newly added instance to update its model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.70.170