Now, let's take a closer look at the evaluation function. The evaluation function accepts an initialized model, cross-validates the model on all three problems, and reports the results as an area under the ROC curve (AUC), as follows:
public static double[] evaluate(Classifier model)
throws Exception { double results[] = new double[4]; String[] labelFiles = new String[]{ "churn", "appetency", "upselling"}; double overallScore = 0.0; for (int i = 0; i < labelFiles.length; i++) {
First, we call the Instance loadData(String, String) function that we implemented earlier to load the training data and merge it with the selected labels:
// Load data Instances train_data = loadData( path + "orange_small_train.data", path+"orange_small_train_"+labelFiles[i]+".labels.txt");
Next, we initialize the weka.classifiers.Evaluation class and pass our dataset. (The dataset is only used to extract data properties; the actual data is not considered.) We call the void crossValidateModel(Classifier, Instances, int, Random) method to begin cross-validation, and we create five folds. As validation is done on random subsets of the data, we need to pass a random seed, as well:
// cross-validate the data Evaluation eval = new Evaluation(train_data); eval.crossValidateModel(model, train_data, 5, new Random(1));
After the evaluation completes, we read the results by calling the double areUnderROC(int) method. As the metric depends on the target value that we are interested in, the method expects a class value index, which can be extracted by searching the index of the "1" value in the class attribute, as follows:
// Save results results[i] = eval.areaUnderROC( train_data.classAttribute().indexOfValue("1")); overallScore += results[i]; }
Finally, the results are averaged and returned:
// Get average results over all three problems results[3] = overallScore / 3; return results; }