All the attributes are not always relevant for the classification tasks, in fact, irrelevant attributes can even decrease the performance of some algorithms. This recipe will show you how to find relevant attributes. It will demonstrate how to select an evaluator and a searching method that applies the selected evaluator on the attributes. In Weka, you have three options for performing attribute selection:
weka.filters.supervised.attribute.AttributeSelectio
weka.attributeSelection
), using the attribute selection classes directly outputting some additional useful informationTo rank a set of attributes, use the following code snippet:
import weka.attributeSelection.CfsSubsetEval; import weka.attributeSelection.GreedyStepwise; import weka.core.Instances; import weka.filters.Filter; import weka.filters.supervised.attribute.AttributeSelection; ... AttributeSelection filter = new AttributeSelection(); // package weka.filters.supervised.attribute! CfsSubsetEval eval = new CfsSubsetEval(); GreedyStepwise search = new GreedyStepwise(); search.setSearchBackwards(true); filter.setEvaluator(eval); filter.setSearch(search); filter.setInputFormat(data); Instances newData = Filter.useFilter(data, filter);
The newData
variable now contains a dataset containing only relevant attributes according to the selected evaluator.
In this example, we will use the filter approach. First, we import the AttributeSelection
class:
import weka.filters.supervised.attribute.AttributeSelection;
Next, import an evaluator, for example, correlation-based feature subset selection. This evaluator requires a GreedyStepwise
search procedure to perform a greedy forward or backward search through the space of attribute subsets. It is implemented in the GreedyStepwise
class:
import weka.attributeSelection.CfsSubsetEval; import weka.attributeSelection.GreedyStepwise;
Import the Instances
and Filter
class:
import weka.core.Instances; import weka.filters.Filter;
Create a new AttributeSelection
object and initialize the evaluator and search algorithm objects:
AttributeSelection filter = new AttributeSelection(); CfsSubsetEval eval = new CfsSubsetEval(); GreedyStepwise search = new GreedyStepwise();
Set the algorithm to search backward:
search.setSearchBackwards(true);
Pass the evaluator and search algorithm to the filter:
filter.setEvaluator(eval); filter.setSearch(search);
Specify the dataset:
filter.setInputFormat(data);
Apply the filter:
Instances newData = Filter.useFilter(data, filter);
In the last step, the greedy algorithm runs with the correlation-based feature selection evaluator to discover a subset of attributes with the highest predictive power. The procedure stops when the addition/deletion of any remaining attributes results in a decrease in evaluation. Attributes not reaching a default threshold may be discarded from the ranking (you can manually set this by the filter.setThrehold(…)
method).
This section will first demonstrate an alternative, yet popular, measure for attribute selection, that is, information gain. Next, we will take a look at how to reduce the attribute space dimensionality using the principle component analysis. Finally, we will show how to select attributes on the fly.
As seen in the recipe, a filter is directly applied to the dataset, so there is no need to manually remove attributes later. This example shows how to apply a low-level approach, if the previous method is not suitable for your purposes.
In this case, we import the actual attribute selection classes (in the previous example, we imported filters based on this class):
import weka.attributeSelection.AttributeSelection
Create a new AtributeSelection
instance:
AttributeSelection attSelect = new AttributeSelection();
Initialize a measure for ranking attributes, for example, information gain:
InfoGainAttributeEval eval = new InfoGainAttributeEval();
Specify the search method as Ranker
, as the name suggests it simply ranks the attributes by a selected measure:
Ranker search = new Ranker();
Load the measure and search method to the attribute selection object:
attSelect.setEvaluator(eval); attSelect.setSearch(search);
Perform the attribute selection with the particular search method and measure using the specified dataset:
attSelect.SelectAttributes(data);
Print the results as follows:
System.out.println(attSelect.toResultsString());
Get, and print, the indices of the selected attributes:
int[] indices = attSelect.selectedAttributes(); System.out.println(Utils.arrayToString(indices));
The output, for example, for the titanic dataset is:
=== Attribute Selection on all input data === Search Method: Attribute ranking. Attribute Evaluator (supervised, Class (nominal): 4 survived): Information Gain Ranking Filter Ranked attributes: 0.14239 3 sex 0.05929 1 class 0.00641 2 age Selected attributes: 3,1,2 : 3
The information gain ranker selected the first three attributes, indicating their importance.
Principal component analysis (PCA) is a dimensionality reduction technique often used to represent the dataset with less attributes while preserving as much information as possible. Dimensionality reduction is accomplished by choosing enough eigenvectors to account for a percentage of the variance in the original data (the default is 95 percent). This example shows how to transform the dataset to a new coordinate system defined by the principal eigenvectors.
import weka.filters.supervised.attribute.AttributeSelection; ... AttributeSelection filter = new AttributeSelection();
Initialize the principal components evaluator:
PrincipalComponents eval = new PrincipalComponents();
Initialize ranker as a search method:
Ranker search = new Ranker();
Set the evaluator and search method, filter the data, and print the newly created dataset in a new coordinate system:
filter.setEvaluator(eval); filter.setSearch(search); filter.setInputFormat(data); Instances newData = Filter.useFilter(data, filter); System.out.println(newData);
The new dataset is now represented by values in a new, smaller coordinate system, which means there are less attributes.
This example shows how to work with a class for running an arbitrary classifier on data that has been reduced through attribute selection.
Import the AttributeSelectedClassifier
class:
import weka.classifiers.meta.AttributeSelectedClassifier;
Create a new object:
AttributeSelectedClassifier classifier = new AttributeSelectedClassifier();
Initialize an evaluator, for example, ReliefF
, which requires the Ranker
search algorithm:
ReliefFAttributeEval eval = new ReliefFAttributeEval(); Ranker search = new Ranker();
Initialize a base classifier, for example, decision trees:
J48 baseClassifier = new J48();
Pass the base classifier, evaluator, and search algorithm to the meta-classifier:
classifier.setClassifier(baseClassifier); classifier.setEvaluator(eval); classifier.setSearch(search);
Finally, run and validate the classifier, for example, with 10-fold-cross validation (see Classification recipe for details):
Evaluation evaluation = new Evaluation(data); evaluation.crossValidateModel(classifier, data, 10, new Random(1)); System.out.println(evaluation.toSummaryString());
The evaluation outputs the classifier performance on a set of attributes reduced by the ReliefF
ranker.
18.222.155.187