Attribute selection

In the next step, we will select only informative attributes, that is, attributes that are more likely to help with prediction. A standard approach to this problem is to check the information gain carried by each attribute. We will use the weka.attributeSelection.AttributeSelection filter, which requires two additional methods: an evaluator (how attribute usefulness is calculated) and search algorithms (how to select a subset of attributes).

In our case, first, we initialize weka.attributeSelection.InfoGainAttributeEval, which implements the calculation of information gain:

InfoGainAttributeEval eval = new InfoGainAttributeEval(); 
Ranker search = new Ranker(); 

To only select the top attributes above a threshold, we initialize weka.attributeSelection.Ranker, in order to rank the attributes with information gain above a specific threshold. We specify this with the -T parameter, while keeping the value of the threshold low, in order to keep the attributes with at least some information:

search.setOptions(new String[] { "-T", "0.001" }); 
The general rule for setting this threshold is to sort the attributes by information gain and pick the threshold where the information gain drops to a negligible value.

Next, we can initialize the AttributeSelection class, set the evaluator and ranker, and apply the attribute selection to our dataset, as follows:

AttributeSelection attSelect = new AttributeSelection(); 
attSelect.setEvaluator(eval); 
attSelect.setSearch(search); 
 
// apply attribute selection 
attSelect.SelectAttributes(data); 

Finally, we remove the attributes that were not selected in the last run by calling the reduceDimensionality(Instances) method:

// remove the attributes not selected in the last run 
data = attSelect.reduceDimensionality(data); 

In the end, we are left with 214 out of 230 attributes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.188.121