Classification using Encog

In the previous section, you saw how to use a Weka library for classification. In this section, we will quickly look at how the same can be achieved by using the Encog library. Encog requires us to build a model to do the classification. Download the Encog library from https://github.com/encog/encog-java-core/releases. Once downloaded, add the .jar file in the Eclipse project, as explained at the beginning of the chapter.

For this example, we will use the iris dataset, which is available in .csv format; it can be downloaded from https://archive.ics.uci.edu/ml/datasets/Iris. From the download path, copy the iris.data.csv file into your data directory. This file contains the data of 150 different flowers. It contains four different measurements about the flowers, and the last column is a label.

We will now perform the classification, using the following steps:

  1. We will use the VersatileMLDataSet method to load the file and define all four columns. The next step is to call the analyze method that will read the entire file and find the statistical parameters, such as the mean, the standard deviation, and many more:
File irisFile = new File("data/iris.data.csv");
VersatileDataSource source = new CSVDataSource(irisFile, false, CSVFormat.DECIMAL_POINT);

VersatileMLDataSet data = new VersatileMLDataSet(source);
data.defineSourceColumn("sepal-length", 0, ColumnType.continuous);
data.defineSourceColumn("sepal-width", 1, ColumnType.continuous);
data.defineSourceColumn("petal-length", 2, ColumnType.continuous);
data.defineSourceColumn("petal-width", 3, ColumnType.continuous);

ColumnDefinition outputColumn = data.defineSourceColumn("species", 4, ColumnType.nominal);
data.analyze();
  1. The next step is to define the output column. Then, it's time to normalize the data; but before that, we need to decide on the model type according to which the data will be normalized, as follows:
data.defineSingleOutputOthersInput(outputColumn); 

EncogModel model = new EncogModel(data);
model.selectMethod(data, MLMethodFactory.TYPE_FEEDFORWARD);

model.setReport(new ConsoleStatusReportable());
data.normalize();
  1. The next step is to fit the model on a training set, leaving a test set aside. We will hold 30% of the data, as specified by the first argument, 0.3; the next argument specifies that we want to shuffle the data in randomly. 1001 says that there is a seed value of 1001, so we use a holdBackValidation model:
model.holdBackValidation(0.3, true, 1001);

  1. Now, it's time to train the model and classify the data, according to the measurements and labels. The cross-validation breaks the training dataset into five different combinations:
model.selectTrainingType(data); 
MLRegression bestMethod = (MLRegression)model.crossvalidate(5, true);
  1. The next step is to display the results of each fold and the errors:
System.out.println( "Training error: " + EncogUtility.calculateRegressionError(bestMethod, model.getTrainingDataset())); 
System.out.println( "Validation error: " + EncogUtility.calculateRegressionError(bestMethod, model.getValidationDataset()));
  1. Now, we will start to use the model to predict the values, using the following code block:
while(csv.next()) { 
StringBuilder result = new StringBuilder();
line[0] = csv.get(0);
line[1] = csv.get(1);
line[2] = csv.get(2);
line[3] = csv.get(3);
String correct = csv.get(4);
helper.normalizeInputVector(line,input.getData(),false);
MLData output = bestMethod.compute(input);
String irisChosen = helper.denormalizeOutputVectorToString(output)[0];

result.append(Arrays.toString(line));
result.append(" -> predicted: ");
result.append(irisChosen);
result.append("(correct: ");
result.append(correct);
result.append(")");

System.out.println(result.toString());
}

This will yield an output similar to the following:

Encog supports many other options in MLMethodFactory, such as SVM, PNN, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.224.226