Modeling suspicious patterns

To design a classifier, we can follow the standard supervised learning steps, as described in Chapter 1, Applied Machine Learning Quick Start. In this recipe, we will include some additional steps to handle unbalanced datasets and evaluate classifiers based on precision and recall. The plan is as follows:

  1. Load the data in the .csv format.
  2. Assign the class attribute.
  3. Convert all of the attributes from a numeric to nominal value to make sure that there are no incorrectly loaded numerical values.
  4. Experiment 1: Evaluating the models with k-fold cross-validation.
  5. Experiment 2: Rebalancing the dataset to a more balanced class distribution, and manually perform cross-validation.
  6. Compare the classifiers by recall, precision, and f-measure.

First, let's load the data using the CSVLoader class, as follows:

String filePath = "/Users/bostjan/Dropbox/ML Java Book/book/datasets/chap07/claims.csv"; 
 
CSVLoader loader = new CSVLoader(); 
loader.setFieldSeparator(","); 
loader.setSource(new File(filePath)); 
Instances data = loader.getDataSet(); 

Next, we need to make sure that all of the attributes are nominal. During the data import, Weka applies some heuristics to guess the most probable attribute type, that is, numeric, nominal, string, or date. As heuristics cannot always guess the correct type, we can set the types manually, as follows:

NumericToNominal toNominal = new NumericToNominal(); 
toNominal.setInputFormat(data); 
data = Filter.useFilter(data, toNominal); 

Before we continue, we need to specify the attribute that we will try to predict. We can achieve this by calling the setClassIndex(int) function:

int CLASS_INDEX = 15; 
data.setClassIndex(CLASS_INDEX); 

Next, we need to remove an attribute describing the policy number, as it has no predictive value. We simply apply the Remove filter, as follows:

Remove remove = new Remove(); 
remove.setInputFormat(data); 
remove.setOptions(new String[]{"-R", ""+POLICY_INDEX}); 
data = Filter.useFilter(data, remove); 

Now, we are ready to start modeling.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.128