Model evaluation

In the last section, we summarized what is needed to complete our model estimation for our supervised machine learning. Now it is time for us to evaluate these estimated models to see if they fit the client's criterions so that we can either move to the results explanation stage or go back to some previous stages to refine our predictive models.

To perform our model evaluation, in this section, we will need to use Root Mean Square Error (RMSE) to assess our linear regression models of predicting Call Center calls, and use confusion matrix to assess our logistic regression model of predicting customer churn, for which the following numbers are often preferred:

  • True Positive (TP): Label is positive and prediction is also positive
  • True Negative (TN): Label is negative and prediction is also negative
  • False Positive (FP): Label is negative but prediction is positive
  • False Negative (FN): Label is positive but prediction is negative

Here, positive means the subscriber departed, and negative means the subscriber stayed.

The preceding four numbers are the building blocks for most classifier-evaluation metrics. A fundamental point in considering classifier evaluation is that pure accuracy (that is, was the prediction correct or incorrect) is not necessarily a best one, as a dataset could be highly unbalanced. For example, if a model is designed to predict fraud from a dataset where 95 percent of the data points are not fraud and 5 percent of the data points are fraud, then a naive classifier that predicts all as not fraud, regardless of the input, will be 95 percent accurate. For our case, the churn ratio is also not very high. For this reason, metrics for precision (positive predictive value) and recall (sensitivity) are typically used because they take into account the type of error. In other words, some balance between precision and recall is needed, which can be captured by combining the two into a single metric called the F-measure that will be calculated here with MLlib.

Just like for model estimation, to calculate RMSEs and to produce a confusion matrix, we need to use MLlib for regression modelling to be implemented with Apache Spark, which, as an exercise, can also be implemented with SPSS on Spark, when we take the MLlib codes to form a SPSS Modeler node. For logistic regression modelling, we will use R notebooks to be implemented in the Databricks environment for Apache Spark. In practice, we tried both MLlib and R for both calculating RMSE and calculating error ratios, because one of the main purposes for this project is to go beyond the limits of our tools of R and MLlib.

RMSE calculations with MLlib

In MLlib, we can use the following codes to calculate RMSE:

val valuesAndPreds = test.map { point =>
  val prediction = new_model.predict(point.features)
  val r = (point.label, prediction)
  r
}
val residuals = valuesAndPreds.map {case (v, p) => math.pow((v - p), 2)}
val MSE = residuals.mean();
val RMSE = math.pow(MSE, 0.5)

Besides the preceding code, MLlib also has some functions in the RegressionMetrics and RankingMetrics classes for us to use for the RMSE calculation.

RMSE calculations with R

In R, the forecast package has an accuracy function that can be used to calculate forecasting accuracy as well as RMSEs:

accuracy(f, x, test=NULL, d=NULL, D=NULL)

The measures calculated are:

  • ME (Mean Error)
  • RMSE (Root Mean Squared Error)
  • MAE (Mean Absolute Error)
  • MPE (Mean Percentage Error)
  • MAPE (Mean Absolute Percentage Error)
  • MASE (Mean Absolute Scaled Error)
  • ACF1 (Autocorrelation of errors at lag 1)

To perform a complete evaluation, we need to calculate RMSEs for all the models we estimated. Then, we compare and pick up the ones with smaller RMSEs.

Confusion matrix and error ratios with MLlib and R

In MLlib, we can use the following code to calculate the error ratios:

// F-measure
val f1Score = metrics.fMeasureByThreshold
f1Score.foreach { case (t, f) =>
  println(s"Threshold: $t, F-score: $f, Beta = 1")
}

val beta = 0.5
val fScore = metrics.fMeasureByThreshold(beta)
f1Score.foreach { case (t, f) =>
  println(s"Threshold: $t, F-score: $f, Beta = 0.5")
}

In R, we have the following code to produce a confusion matrix, which can be included in our R notebook for implementation:

model$confusion

With what is described in the preceding paragraph, we selected our final linear regression models for Call Center calls prediction and our final logistic regression models for subscriber churns.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.109.234