Fraud detection of insurance claims

First, we'll take a look at suspicious behavior detection, where the goal is to learn known patterns of frauds, which correspond to modeling known-knowns.

Dataset

We'll work with a dataset describing insurance transactions publicly available at Oracle Database Online Documentation (2015), as follows:

http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/anomalies.htm

The dataset describes insurance vehicle incident claims for an undisclosed insurance company. It contains 15,430 claims; each claim comprises 33 attributes describing the following components:

  • Customer demographic details (Age, Sex, MartialStatus, and so on)
  • Purchased policy (PolicyType, VehicleCategory, number of supplements, agent type, and so on)
  • Claim circumstances (day/month/week claimed, policy report filed, witness present, past days between incident-policy report, incident-claim, and so on)
  • Other customer data (number of cars, previous claims, DriverRating, and so on)
  • Fraud found (yes and no)

A sample of the database shown in the following screenshot depicts the data loaded into Weka:

Dataset

Now the task is to create a model that will be able to identify suspicious claims in future. The challenging thing about this task is the fact that only 6% of claims are suspicious. If we create a dummy classifier saying no claim is suspicious, it will be accurate in 94% cases. Therefore, in this task, we will use different accuracy measures: precision and recall.

Recall the outcome table from Chapter 1, Applied Machine Learning Quick Start, where there are four possible outcomes denoted as true positive, false positive, false negative, and true negative:

  

Classified as

Actual

 

Fraud

No fraud

Fraud

TP—true positive

FN—false negative

No fraud

FP—false positive

TN—true negative

Precision and recall are defined as follows:

  • Precision is equal to the proportion of correctly raised alarms, as follows:
    Dataset
  • Recall is equal to the proportion of deviant signatures, which are correctly identified as such:
    Dataset
  • With these measures, our dummy classifier scores Pr= 0 and Re = 0 as it never marks any instance as fraud (TP=0). In practice, we want to compare classifiers by both numbers, hence we use F-measure. This is a de-facto measure that calculates a harmonic mean between precision and recall, as follows:
    Dataset

Now let's move on to designing a real classifier.

Modeling suspicious patterns

To design a classifier, we can follow the standard supervised learning steps as described in Chapter 1, Applied Machine Learning Quick Start. In this recipe, we will include some additional steps to handle unbalanced dataset and evaluate classifiers based on precision and recall. The plan is as follows:

  • Load the data in the .csv format
  • Assign the class attribute
  • Convert all the attributes from numeric to nominal in order to make sure there are no incorrectly loaded numerical values
  • Experiment 1: Evaluate models with k-fold cross validation
  • Experiment 2: Rebalance dataset to a more balanced class distribution and manually perform cross validation
  • Compare classifiers by recall, precision, and f-measure

First, let's load the data using the CSVLoader class, as follows:

String filePath = "/Users/bostjan/Dropbox/ML Java Book/book/datasets/chap07/claims.csv";

CSVLoader loader = new CSVLoader();
loader.setFieldSeparator(",");
loader.setSource(new File(filePath));
Instances data = loader.getDataSet();

Next, we need to make sure all the attributes are nominal. During the data import, Weka applies some heuristics to guess the most probable attribute type, that is, numeric, nominal, string, or date. As heuristics cannot always guess the correct type, we can set types manually, as follows:

NumericToNominal toNominal = new NumericToNominal();
toNominal.setInputFormat(data);
data = Filter.useFilter(data, toNominal);

Before we continue, we need to specify the attribute that we will try to predict. We can achieve this by calling the setClassIndex(int) function:

int CLASS_INDEX = 15;
data.setClassIndex(CLASS_INDEX);

Next, we need to remove an attribute describing the policy number as it has no predictive value. We simply apply the Remove filter, as follows:

Remove remove = new Remove();
remove.setInputFormat(data);
remove.setOptions(new String[]{"-R", ""+POLICY_INDEX});
data = Filter.useFilter(data, remove);

Now we are ready to start modeling.

Vanilla approach

The vanilla approach is to directly apply the lesson as demonstrated in Chapter 3, Basic Algorithms – Classification, Regression, Clustering, without any pre-processing and not taking into account dataset specifics. To demonstrate drawbacks of vanilla approach, we will simply build a model with default parameters and apply k-fold cross validation.

First, let's define some classifiers that we want to test:

ArrayList<Classifier>models = new ArrayList<Classifier>();
models.add(new J48());
models.add(new RandomForest());
models.add(new NaiveBayes());
models.add(new AdaBoostM1());
models.add(new Logistic());

Next, we create an Evaluation object and perform k-fold cross validation by calling the crossValidate(Classifier, Instances, int, Random, String[]) method, outputting precision, recall, and fMeasure:

int FOLDS = 3;
Evaluation eval = new Evaluation(data);

for(Classifier model : models){
  eval.crossValidateModel(model, data, FOLDS, 
  new Random(1), new String[] {});
  System.out.println(model.getClass().getName() + "
"+
    "	Recall:    "+eval.recall(FRAUD) + "
"+
    "	Precision: "+eval.precision(FRAUD) + "
"+
    "	F-measure: "+eval.fMeasure(FRAUD));
}

The evaluation outputs the following scores:

weka.classifiers.trees.J48
  Recall:    0.03358613217768147
  Precision: 0.9117647058823529
  F-measure: 0.06478578892371996
...
weka.classifiers.functions.Logistic
  Recall:    0.037486457204767065
  Precision: 0.2521865889212828
  F-measure: 0.06527070364082249

We can see the results are not very promising. Recall, that is, the share of discovered frauds among all frauds is only 1-3%, meaning that only 1-3/100 frauds are detected. On the other hand, precision, that is, the accuracy of alarms is 91%, meaning that in 9/10 cases, when a claim is marked as fraud, the model is correct.

Dataset rebalancing

As the number of negative examples, that is, frauds, is very small, compared to positive examples, the learning algorithms struggle with induction. We can help them by giving them a dataset, where the share of positive and negative examples is comparable. This can be achieved with dataset rebalancing.

Weka has a built-in filter, Resample, which produces a random subsample of a dataset using either sampling with replacement or without replacement. The filter can also bias distribution towards a uniform class distribution.

We will proceed by manually implementing k-fold cross validation. First, we will split the dataset into k equal folds. Fold k will be used for testing, while the other folds will be used for learning. To split dataset into folds, we'll use the StratifiedRemoveFolds filter, which maintains the class distribution within the folds, as follows:

StratifiedRemoveFolds kFold = new StratifiedRemoveFolds();
kFold.setInputFormat(data);

double measures[][] = new double[models.size()][3];

for(int k = 1; k <= FOLDS; k++){

  // Split data to test and train folds
  kFold.setOptions(new String[]{
    "-N", ""+FOLDS, "-F", ""+k, "-S", "1"});
  Instances test = Filter.useFilter(data, kFold);
  
  kFold.setOptions(new String[]{
    "-N", ""+FOLDS, "-F", ""+k, "-S", "1", "-V"});
    // select inverse "-V"
  Instances train = Filter.useFilter(data, kFold);

Next, we can rebalance train dataset, where the–Z parameter specifies the percentage of dataset to be resampled, and –B bias the class distribution towards uniform distribution:

Resample resample = new Resample();
resample.setInputFormat(data);
resample.setOptions(new String[]{"-Z", "100", "-B", "1"}); //with replacement
Instances balancedTrain = Filter.useFilter(train, resample);

Next, we can build classifiers and perform evaluation:

for(ListIterator<Classifier>it = models.listIterator(); it.hasNext();){
  Classifier model = it.next();
  model.buildClassifier(balancedTrain);
  eval = new Evaluation(balancedTrain);
  eval.evaluateModel(model, test);
  
// save results for average
  measures[it.previousIndex()][0] += eval.recall(FRAUD);
  measures[it.previousIndex()][1] += eval.precision(FRAUD);
 measures[it.previousIndex()][2] += eval.fMeasure(FRAUD);
}

Finally, we calculate the average and output the best model:

// calculate average
for(int i = 0; i < models.size(); i++){
  measures[i][0] /= 1.0 * FOLDS;
  measures[i][1] /= 1.0 * FOLDS;
  measures[i][2] /= 1.0 * FOLDS;
}

// output results and select best model
Classifier bestModel = null; double bestScore = -1;
for(ListIterator<Classifier> it = models.listIterator(); it.hasNext();){
  Classifier model = it.next();
  double fMeasure = measures[it.previousIndex()][2];
  System.out.println(
    model.getClass().getName() + "
"+
    "	Recall:    "+measures[it.previousIndex()][0] + "
"+
    "	Precision: "+measures[it.previousIndex()][1] + "
"+
    "	F-measure: "+fMeasure);
  if(fMeasure > bestScore){
    bestScore = fMeasure;
    bestModel = model;
    
  }
}
System.out.println("Best model:"+bestModel.getClass().getName());

Now the performance of the models has significantly improved, as follows:

weka.classifiers.trees.J48
  Recall:    0.44204845100610574
  Precision: 0.14570766048577555
  F-measure: 0.21912423640160392
...
weka.classifiers.functions.Logistic
  Recall:    0.7670657247204478
  Precision: 0.13507459756495374
  F-measure: 0.22969038530557626
Best model: weka.classifiers.functions.Logistic

What we can see is that all the models have scored significantly better; for instance, the best model, Logistic Regression, correctly discovers 76% of frauds, while producing a reasonable amount of false alarms—only 13% of claims marked as fraud are indeed fraudulent. If an undetected fraud is significantly more expensive than investigation of false alarms, then it makes sense to deal with an increased number of false alarms.

The overall performance has most likely still some room for improvement; we could perform attribute selection and feature generation and apply more complex model learning that we discussed in Chapter 3, Basic Algorithms – Classification, Regression, Clustering.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.7