To evaluate the classifier on a separate dataset, we will use the following steps:
- Let's start by importing the emails located in our test folder:
InstanceList testInstances = new
InstanceList(classifier.getInstancePipe()); folderIterator = new FileIterator( new File[] {new File(testFolderPath)}, new TxtFilter(), FileIterator.LAST_DIRECTORY);
- We will pass the data through the same pipeline that we initialized during training:
testInstances.addThruPipe(folderIterator);
- To evaluate classifier performance, we'll use the cc.mallet.classify.Trial class, which is initialized with a classifier and set of test instances:
Trial trial = new Trial(classifier, testInstances);
- The evaluation is performed immediately at initialization. We can then simply take out the measures that we care about. In our example, we'd like to check the precision and recall on classifying spam email messages, or F-measure, which returns a harmonic mean of both values, as follows:
System.out.println( "F1 for class 'spam': " + trial.getF1("spam")); System.out.println( "Precision:" + trial.getPrecision(1)); System.out.println( "Recall:" + trial.getRecall(1));
The evaluation object outputs the following results:
F1 for class 'spam': 0.9731800766283524 Precision: 0.9694656488549618 Recall: 0.9769230769230769
The results show that the model correctly discovers 97.69% of spam messages (recall), and when it marks an email as spam, it is correct in 96.94% cases. In other words, it misses approximately 2 per 100 spam messages and marks 3 per 100 valid messages as spam. So, it's not really perfect, but it's more than a good start!