Building your own classifier (Advanced)

This recipe will show you how to build your own classifier. It will cover the most essential steps required to design a functional classifier. We will create a classifier that takes instances with numeric or nominal attributes and nominal class value(s). The prediction will be simple: the class value of a new instance will be predicted as the class value of the second nearest neighbor (in case there are more neighbors with the same distance, it will just take the first one). To make this work, the classifier will need at least two learning examples.

How to do it...

To build a classifier, we have to extend the weka.classifiers.Classifier class, as shown in the following snippet:

import java.util.Enumeration;

import weka.classifiers.Classifier;
import weka.core.Capabilities;
import weka.core.Capabilities.Capability;
import weka.core.Instance;
import weka.core.Instances;

public class MyClassifier extends Classifier {

  private Instances trainData;

  public Capabilities getCapabilities() {

    Capabilities result = super.getCapabilities();
    result.disableAll();

    result.enable(Capability.NUMERIC_ATTRIBUTES);
    result.enable(Capability.NOMINAL_ATTRIBUTES);
    result.enable(Capability.NOMINAL_CLASS);
    result.setMinimumNumberInstances(2);

    return result;
  }

  @Override
  public void buildClassifier(Instances data) throws Exception {

    getCapabilities().testWithFail(data);
    data = new Instances(data);
    data.deleteWithMissingClass();
    Instances trainData = new Instances(data, 0, data.numInstances());

  }

  public double classifyInstance(Instance instance) {

    double minDistance = Double.MAX_VALUE;
    double secondMinDistance = Double.MAX_VALUE;
    double distance;
    double classVal = 0, minClassVal = 0;

    Enumeration enu = trainData.enumerateInstances();
    while (enu.hasMoreElements()) {

      Instance trainInstance = (Instance) enu.nextElement();
      if (!trainInstance.classIsMissing()) {

        distance = distance(instance, trainInstance);

        if (distance < minDistance) {

          secondMinDistance = minDistance;
          minDistance = distance;

          classVal = minClassVal;
          minClassVal = trainInstance.classValue();

        } else if (distance < secondMinDistance) {
          secondMinDistance = distance;
          classVal = trainInstance.classValue();
        }
      }
    }
    return classVal;
  }
}

The MyClassifier class can now be used as described in the Training a classifier (Simple) recipe.

How it works...

Each classifier extends the weka.classifiers.Classifier abstract class, hence we first import:

import java.util.Enumeration;

Next, to define what kind of magic powers our classifier can possess, we need the weka.core.Capabilities class and some constants from the same class:

import weka.core.Capabilities;
import weka.core.Capabilities.Capability;

Import the Instance and Instances class

import weka.core.Instance;
import weka.core.Instances;

Create a new class with your classifier name that extends the weka.classifiers.Classifier abstract class. If you also want your classifier to be incremental (as shown in the previous example), make sure you implement the weka.classifiers.UpdateableClassifier interface and the updateClassifier(Instance instance) method.

public class MyClassifier extends Classifier {
Initialize a private variable to store classifier's train dataset
  private Instances trainData;

Now, specify what kind of data your classifier is able to handle in the getCapabilities() method.

  public Capabilities getCapabilities() {

First, inherit all possible capabilities a classifier can handle and, by default, disable all of them to avoid surprises later.

    Capabilities result = super.getCapabilities();
    result.disableAll();

We want our classifier to be able to handle numeric and nominal attributes only. Enable them by setting the following constants from the weka.core.Capabilities.Capability class using enable(enum Capability) method.

    result.enable(Capability.NUMERIC_ATTRIBUTES);
    result.enable(Capability.NOMINAL_ATTRIBUTES);

Next, enable your classifier to handle nominal class values:

    result.enable(Capability.NOMINAL_CLASS);

Specify that it needs at least two training instances with the setMinimumNumberInstances(int) method:

    result.setMinimumNumberInstances(2);

Finally, return the capabilities object:

    return result;
  }

The next underpinning component of your classifier is buildClassifier(Instances) method. This is where usually most of the hard work is done to create a model, for example, a decision tree or SVM vectors. Our classifier is an exception from the family of lazy classifiers, which passes all the hard work to the classifyInstance(Instance) method as we will see later.

  @Override
  public void buildClassifier(Instances data) throws Exception {

First, check if the passed train dataset is in compliance with the capabilities defined previously.

    getCapabilities().testWithFail(data);

Next, copy the complete dataset and remove the instances with missing class values using the deleteWithMissingClass() method, since they are not useful for classification:

    // remove instances with missing class
    data = new Instances(data);
    data.deleteWithMissingClass();

Now, we are ready to do the hard work of building an actual model. Well, our lazy classifiers simply remember the training data and this is it. If your classifier is not as lazy as this one, then this is a good place to implement it.

    trainData = new Instances(data, 0, data.numInstances());
  }

OK, we have built the model, and the last missing part is the double classifyInstance(Instance instance) method that predicts a class value of a given instance. Note, that it actually returns an index (as double) of the nominal value in the class attribute. The following method simply finds an instance with the second closest distance. The distance method itself is implemented as follows:

  public double classifyInstance(Instance instance) {

    double minDistance = Double.MAX_VALUE;
    double secondMinDistance = Double.MAX_VALUE;
    double distance;
    double classVal = 0, minClassVal = 0;

    Enumeration enu = trainData.enumerateInstances();
    while (enu.hasMoreElements()) {

      Instance trainInstance = (Instance) enu.nextElement();
      if (!trainInstance.classIsMissing()) {

        distance = distance(instance, trainInstance);

        if (distance < minDistance) {

          secondMinDistance = minDistance;
          minDistance = distance;

          classVal = minClassVal;
          minClassVal = trainInstance.classValue();

        } else if (distance < secondMinDistance) {
          secondMinDistance = distance;
          classVal = trainInstance.classValue();
        }
      }
    }
    return classVal;

  }

The distance(Instance, Instance) method demonstrates a basic approach of how to measure the difference between two instances. It performs any attribute normalization. The main idea is to summarize the difference by corresponding attributes excluding class value. Since our classifier also supports nominal attribute values, we simply use Hamming distance. The source code can be found in code bundle.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.124.177