The vanilla approach

The vanilla approach is to directly apply the lesson, just like as it was demonstrated in Chapter 3, Basic Algorithms - Classification, Regression, Clustering, without any preprocessing, and not taking dataset specifics into account. To demonstrate the drawbacks of the vanilla approach, we will simply build a model with the default parameters and apply k-fold cross-validation.

First, let's define some classifiers that we want to test, as follows:

ArrayList<Classifier>models = new ArrayList<Classifier>(); 
models.add(new J48()); 
models.add(new RandomForest()); 
models.add(new NaiveBayes()); 
models.add(new AdaBoostM1()); 
models.add(new Logistic()); 

Next, we need to create an Evaluation object and perform k-fold cross-validation by calling the crossValidate(Classifier, Instances, int, Random, String[]) method, providing the precision, recall, and fMeasure as output:

int FOLDS = 3; 
Evaluation eval = new Evaluation(data); 
 
for(Classifier model : models){ 
  eval.crossValidateModel(model, data, FOLDS,  
  new Random(1), new String[] {}); 
  System.out.println(model.getClass().getName() + "
"+ 
    "	Recall:    "+eval.recall(FRAUD) + "
"+ 
    "	Precision: "+eval.precision(FRAUD) + "
"+ 
    "	F-measure: "+eval.fMeasure(FRAUD)); 
} 

The evaluation provides the following scores as output:

    weka.classifiers.trees.J48
      Recall:    0.03358613217768147
      Precision: 0.9117647058823529
      F-measure: 0.06478578892371996
    ...
    weka.classifiers.functions.Logistic
      Recall:    0.037486457204767065
      Precision: 0.2521865889212828
      F-measure: 0.06527070364082249
  

We can see that the results are not very promising. The recall, that is, the share of discovered frauds among all frauds, is only 1-3%, meaning that only 1-3/100 frauds are detected. On the other hand, the precision, that is, the accuracy of alarms, is 91%, meaning that in 9/10 cases, when a claim is marked as fraud, the model is correct.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.51.157