Attribute selection

In the next step, we will select only informative attributes, that is, attributes that are more likely to help with prediction. A standard approach to this problem is to check the information gain carried by each attribute. We will use the weka.attributeSelection.AttributeSelection filter, which requires two additional methods: an evaluator (how attribute usefulness is calculated) and search algorithms (how to select a subset of attributes).

In our case, first, we initialize weka.attributeSelection.InfoGainAttributeEval, which implements the calculation of information gain:

InfoGainAttributeEval eval = new InfoGainAttributeEval(); 
Ranker search = new Ranker();

To only select the top attributes above a threshold, we initialize weka.attributeSelection.Ranker, in order to rank the attributes with information gain above a specific threshold. We specify this with the -T parameter, while keeping the value of the threshold low, in order to keep the attributes with at least some information:

search.setOptions(new String[] { "-T", "0.001" });

The general rule for setting this threshold is to sort the attributes by information gain and pick the threshold where the information gain drops to a negligible value.

Next, we can initialize the AttributeSelection class, set the evaluator and ranker, and apply the attribute selection to our dataset, as follows:

AttributeSelection attSelect = new AttributeSelection(); 
attSelect.setEvaluator(eval); 
attSelect.setSearch(search); 
 
// apply attribute selection 
attSelect.SelectAttributes(data);

Finally, we remove the attributes that were not selected in the last run by calling the reduceDimensionality(Instances) method:

// remove the attributes not selected in the last run 
data = attSelect.reduceDimensionality(data);

In the end, we are left with 214 out of 230 attributes.

Table of Contents for Attribute selection

Create new playlist

Sign In

Sign Up

Table of Contents for
Attribute selection