Loading data

Before we start the analysis, we will load the data in Weka's Attribute-Relation File Format (ARFF) and print the total number of loaded instances. Each data sample is held within a DataSource object, while the complete dataset, accompanied by meta-information, is handled by the Instances object.

To load the input data, we will use the DataSource object that accepts a variety of file formats and converts them into Instances:

DataSource source = new DataSource("data/zoo.arff"); 
Instances data = source.getDataSet(); 
System.out.println(data.numInstances() + " instances loaded."); System.out.println(data.toString());

This will provide the number of loaded instances as output, as follows:

101 instances loaded.

We can also print the complete dataset by calling the data.toString() method.

Our task is to learn a model that is able to predict the animal attribute in the future examples for which we know the other attributes, but do not know the animal label. Hence, we will remove the animal attribute from the training set. We will accomplish this by filtering out the animal attribute, using the Remove() filter.

First, we set a string table of parameters, specifying that the first attribute must be removed. The remaining attributes are used as our dataset for training a classifier:

Remove remove = new Remove(); 
String[] opts = new String[]{ "-R", "1"}; 

Finally, we call the Filter.useFilter(Instances, Filter) static method to apply the filter on the selected dataset:

remove.setOptions(opts); 
remove.setInputFormat(data); 
data = Filter.useFilter(data, remove); 
System.out.println(data.toString());
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108