Model evaluation

In the previous section, we completed our model estimation. Now it is the time for us to evaluate these estimated models to check whether they fit our client's criteria so that we can either move to the explanation of results or go back to some previous stages to refine our predictive models.

To perform our model evaluation, in this section, we will use a confusion matrix and error ratio numbers. To calculate them, we need to use our test data rather than training data.

Here are the two common error types in student attrition prediction:

  • False negative (Type I error): This involves failing to identify a student who has a high propensity to leave.

    From a practical perspective, this is the least desirable error as the student is very likely to leave and the university lost its chance to act to keep the students, thus adversely affecting its income and also the students' future career.

  • False positive (Type II error): This involves classifying a good, satisfied student as one likely to leave.

    From a practical perspective, this may be acceptable as it does not impact the income or students' future career, but it will create confusion and may waste some of the university's resources as the university will act or even offer some special assistance to save these students.

A quick evaluation

As discussed before, MLlib has algorithms to return a confusion matrix and even false positive numbers.

MLlib has confusionMatrix and numFalseNegatives() to use.

The following code calculates error ratios:

// Evaluate model on test instances and compute test error
val testErr = testData.map { point =>
  val prediction = model.predict(point.features)
  if (point.label == prediction) 1.0 else 0.0
}.mean()
println("Test Error = " + testErr)
println("Learned Random Forest:n" + model.toDebugString)

The following code may be used to obtain evaluation metrics for the estimated models:

// Get evaluation metrics.
val metrics = new MulticlassMetrics(predictionAndLabels)
val precision = metrics.precision
println("Precision = " + precision)

To visualize the performance of our classifiers, we can use the ROCR R package. For more info about using ROCR, readers may visit https://rocr.bioinf.mpi-sb.mpg.de/.

The confusion matrix and error ratios

Any predictive algorithm going into production will have to be the one with the least Type I error.

In our case, we used multiple algorithms on a Test dataset to predict student attrition. Shown in the following screenshot are the results from the top two performing algorithms:

The confusion matrix and error ratios

To implement the model evaluation, we need to adopt the same method used in the Model estimation section; that is, we need to input all the codes into our Zeppelin notebook and then run the model evaluation part of the code to obtain tables similar to the following one:

Predicted

Real

 

0

1-attrition

0

4527

733

1-attrition

57

342

399 students at risk

85.34 % accuracy (4257+342)/(4257+342+57+733)

With the preceding evaluation, we can compare models and select the acceptable ones.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.144.108