Filtering attributes (Simple)

A dataset often contains some parts of the data that are not helpful for analysis. One way to get rid of them is to pre-process the dataset and then import it to the Weka. The other way is to remove them after the dataset is loaded in Weka. The supervised filters can take into account the class attribute, while the unsupervised filters disregard it. In addition, filters can perform operation(s) on an attribute or instance that meets filter conditions. These are attribute-based and instance-based filters, respectively. Most filters implement the OptionHandler interface allowing you to set the filter options via a String array.

This task will demonstrate how to create a filter and apply it on the dataset. Additional sections show a variety of cases such as discretization and classifier-specific filtering.

How to do it...

Before starting, load a dataset, as shown in the previous recipe. Then, to remove, for example, the second attribute from the dataset, use the following code snippet:

import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
 ...
String[] opts = new String[]{ "-R", "2"};
Remove remove = new Remove();  
remove.setOptions(opts);
remove.setInputFormat(dataset); 
Instances newData = Filter.useFilter(dataset, remove);

The new dataset is now without the second attribute from the original dataset.

How it works...

First, we import the Instances object that holds our dataset.

import weka.core.Instances;

Next, we import the Filter object, which is used to run the selected filter.

import weka.filters.Filter;

For example, if you want to remove a subset of attributes from the dataset, you need this unsupervised attribute filter

weka.filters.unsupervised.attribute.Remove

Now, let's construct the OptionHanlder interface as a String array:

String[] options = new String[]{...};

The filter documentation specifies the options as follows: specify the range of attributes to act on. This is a comma-separated list of attribute indices, with first and last valid values. Specify an inclusive range with -. For example, first-3,5,6-10,last.

Suppose we want to remove the second attribute. Specify that we will use the Range parameter and remove the second attribute. The first attribute index is 1, while 0 is used when a new attribute is created, as shown in the previous recipe.

{"-R", "2"}

Initialize a new filter instance as follows:

Remove remove = new Remove();

Pass the options to the newly created filter as follows:

remove.setOptions(options);

Then pass the original dataset (after setting the options):

remove.setInputFormat(dataset); 

And finally, apply the filter that returns a new dataset:

Instances newData = Filter.useFilter(dataset, remove);

The new dataset can now be used in other tasks.

There's more...

In addition to the Remove filter, we will take a closer look at another important filter; that is, attribute discretization that transforms a real-valued attribute to a nominal-valued attribute. Further, we will demonstrate how to prepare a classifier-specific filter that can apply filtering on the fly.

Attribute discretization

We will first see how an instance filter discretizes a range of numeric attributes in the dataset into nominal attributes.

Use the following code snippet to discretize all the attribute values to binary values:

import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Discretize;
 ...
String[] options = new String[4];

Specify the number of discrete intervals, for example 2:

options[0] = "-B";
options[1] = "2";

Specify the range of the attribute on which you want to apply the filter, for example, all the attributes:

options[2 = "-R";
options[3 = "first-last";

Apply the filter:

Discretize discretize = new Discretize();  
discretize.setOptions(options);
discretize.setInputFormat(dataset); 
Instances newData = Filter.useFilter(dataset, discretize);

Classifier-specific filter

An easy way to filter data on the fly is to use the FilteredClassifier class. This is a meta-classifier that removes the necessity of filtering the data before training the classifier and prediction. This example demonstrates a meta-classifier with the Remove filter and J48 decision trees for removing the first attribute (it could be, for example, a numeric ID attribute) in the dataset. For additional details on classifiers see the Training a classifier (Simple) and Building your own classifier (Advanced) recipe, for evaluation see the Testing and evaluating your models (Simple) recipe.

Import the FilteredClassifier meta classifier, the J48 decision trees classifier, and the Remove filter:

import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.unsupervised.attribute.Remove;

Initialize the filter and base classifier:

Remove rm = new Remove();
rm.setAttributeIndices("1");
J48 j48 = new J48();

Create the FilteredClassifier object, specify filter, and base classifier:

FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(rm);
fc.setClassifier(j48);

Build the meta-classifier:

Instances dataset = ...
fc.buildClassifier(dataset);

To classify an instance, you can simply use the following:

Instance instance = ...
double prediction = fc.classifyInstance(instance);

The instance is automatically filtered before classification, in our case, the first attribute is removed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.237.201