Selecting attributes (Intermediate)

All the attributes are not always relevant for the classification tasks, in fact, irrelevant attributes can even decrease the performance of some algorithms. This recipe will show you how to find relevant attributes. It will demonstrate how to select an evaluator and a searching method that applies the selected evaluator on the attributes. In Weka, you have three options for performing attribute selection:

  • The filter approach (located in weka.filters.supervised.attribute.AttributeSelectio
  • located in weka.attributeSelection), using the attribute selection classes directly outputting some additional useful information
  • The meta-classifier approach that combines a search algorithm and evaluator next to a base classifier

Getting ready

Load a dataset as described in the Loading the data (Simple) recipe.

How to do it...

To rank a set of attributes, use the following code snippet:

import weka.attributeSelection.CfsSubsetEval;
import weka.attributeSelection.GreedyStepwise;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.supervised.attribute.AttributeSelection;
...
AttributeSelection filter = new AttributeSelection();  // package weka.filters.supervised.attribute!
CfsSubsetEval eval = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();
search.setSearchBackwards(true);
filter.setEvaluator(eval);
filter.setSearch(search);
filter.setInputFormat(data);
Instances newData = Filter.useFilter(data, filter);

The newData variable now contains a dataset containing only relevant attributes according to the selected evaluator.

How it works...

In this example, we will use the filter approach. First, we import the AttributeSelection class:

import weka.filters.supervised.attribute.AttributeSelection;

Next, import an evaluator, for example, correlation-based feature subset selection. This evaluator requires a GreedyStepwise search procedure to perform a greedy forward or backward search through the space of attribute subsets. It is implemented in the GreedyStepwise class:

import weka.attributeSelection.CfsSubsetEval;
import weka.attributeSelection.GreedyStepwise;

Import the Instances and Filter class:

import weka.core.Instances;
import weka.filters.Filter;

Create a new AttributeSelection object and initialize the evaluator and search algorithm objects:

AttributeSelection filter = new AttributeSelection();
CfsSubsetEval eval = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();

Set the algorithm to search backward:

search.setSearchBackwards(true);

Pass the evaluator and search algorithm to the filter:

filter.setEvaluator(eval);
filter.setSearch(search);

Specify the dataset:

filter.setInputFormat(data);

Apply the filter:

Instances newData = Filter.useFilter(data, filter);

In the last step, the greedy algorithm runs with the correlation-based feature selection evaluator to discover a subset of attributes with the highest predictive power. The procedure stops when the addition/deletion of any remaining attributes results in a decrease in evaluation. Attributes not reaching a default threshold may be discarded from the ranking (you can manually set this by the filter.setThrehold(…) method).

There's more...

This section will first demonstrate an alternative, yet popular, measure for attribute selection, that is, information gain. Next, we will take a look at how to reduce the attribute space dimensionality using the principle component analysis. Finally, we will show how to select attributes on the fly.

Select attributes using information gain

As seen in the recipe, a filter is directly applied to the dataset, so there is no need to manually remove attributes later. This example shows how to apply a low-level approach, if the previous method is not suitable for your purposes.

In this case, we import the actual attribute selection classes (in the previous example, we imported filters based on this class):

import weka.attributeSelection.AttributeSelection

Create a new AtributeSelection instance:

AttributeSelection attSelect = new AttributeSelection();

Initialize a measure for ranking attributes, for example, information gain:

InfoGainAttributeEval eval = new InfoGainAttributeEval();

Specify the search method as Ranker, as the name suggests it simply ranks the attributes by a selected measure:

Ranker search = new Ranker();

Load the measure and search method to the attribute selection object:

attSelect.setEvaluator(eval);
attSelect.setSearch(search);

Perform the attribute selection with the particular search method and measure using the specified dataset:

attSelect.SelectAttributes(data);

Print the results as follows:

System.out.println(attSelect.toResultsString());

Get, and print, the indices of the selected attributes:

int[] indices = attSelect.selectedAttributes();
System.out.println(Utils.arrayToString(indices));

The output, for example, for the titanic dataset is:

=== Attribute Selection on all input data ===

Search Method:
  Attribute ranking.

Attribute Evaluator (supervised, Class (nominal): 4 survived):
  Information Gain Ranking Filter

Ranked attributes:
 0.14239  3 sex
 0.05929  1 class
 0.00641  2 age

Selected attributes: 3,1,2 : 3

The information gain ranker selected the first three attributes, indicating their importance.

Principal component analysis

Principal component analysis (PCA) is a dimensionality reduction technique often used to represent the dataset with less attributes while preserving as much information as possible. Dimensionality reduction is accomplished by choosing enough eigenvectors to account for a percentage of the variance in the original data (the default is 95 percent). This example shows how to transform the dataset to a new coordinate system defined by the principal eigenvectors.

import weka.filters.supervised.attribute.AttributeSelection;
...
AttributeSelection filter = new AttributeSelection();

Initialize the principal components evaluator:

PrincipalComponents eval = new PrincipalComponents();

Initialize ranker as a search method:

Ranker search = new Ranker();

Set the evaluator and search method, filter the data, and print the newly created dataset in a new coordinate system:

filter.setEvaluator(eval);
filter.setSearch(search);
filter.setInputFormat(data);
Instances newData = Filter.useFilter(data, filter);
System.out.println(newData);

The new dataset is now represented by values in a new, smaller coordinate system, which means there are less attributes.

Classifier-specific selection

This example shows how to work with a class for running an arbitrary classifier on data that has been reduced through attribute selection.

Import the AttributeSelectedClassifier class:

import weka.classifiers.meta.AttributeSelectedClassifier;

Create a new object:

AttributeSelectedClassifier classifier = new AttributeSelectedClassifier();

Initialize an evaluator, for example, ReliefF, which requires the Ranker search algorithm:

ReliefFAttributeEval eval = new ReliefFAttributeEval();
Ranker search = new Ranker();

Initialize a base classifier, for example, decision trees:

J48 baseClassifier = new J48();

Pass the base classifier, evaluator, and search algorithm to the meta-classifier:

classifier.setClassifier(baseClassifier);
classifier.setEvaluator(eval);
classifier.setSearch(search);

Finally, run and validate the classifier, for example, with 10-fold-cross validation (see Classification recipe for details):

Evaluation evaluation = new Evaluation(data);
evaluation.crossValidateModel(classifier, data, 10, new Random(1));
System.out.println(evaluation.toSummaryString());

The evaluation outputs the classifier performance on a set of attributes reduced by the ReliefF ranker.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.115.154