Model evaluation

In the last section, we completed our model estimation. Now, it is the time for us to evaluate these estimated models to check whether they fit the city's criterions so that we can either move to the results explanation or go back to some previous stages to refine our predictive models.

To perform our model evaluation, in this section, we will mainly use root mean square error (RMSE) to assess our models for both the regression and time series models. While other measures, such as MSE, can also be used to assess models, as an exercise, we will focus on RMSE as the processes of using other measures are similar.

When working on this real-life project, as mentioned in the Methods of service forecasting section of this chapter, we also used decision tree and random forest models, for which we should use a confusion matrix and error ratios to evaluate. Here, we will not discuss these model evaluation methods as they are used a few times in the previous chapters, such as in Chapter 4, Fraud Detection on Spark.

Similarly to model estimation, to calculate RMSE, we need to use MLlib for regression modeling to be implemented with Zeppeline notebooks on Spark. For time series modeling, we will use R notebooks to be implemented in the Databricks environment of Spark.

RMSE calculation with MLlib

In MLlib, we can use the following code to calculate RMSE:

val valuesAndPreds = test.map { point =>
  val prediction = new_model.predict(point.features)
  val r = (point.label, prediction)
  r
}
val residuals = valuesAndPreds.map {case (v, p) => math.pow((v - p), 2)}
val MSE = residuals.mean();
val RMSE = math.pow(MSE, 0.5)

Besides the preceding, MLlib also has some functions in the RegressionMetrics and RankingMetrics classes for us to use for RMSE calculation.

RMSE calculation with R

In R, the forecast package has an accuracy function that can be used to calculate forecasting accuracy as well as RMSE. Take a look at the following:

accuracy(f, x, test=NULL, d=NULL, D=NULL)

The measures calculated are:

  • ME (mean error)
  • RMSE (root mean square error)
  • MAE (mean absolute error)
  • MPE (mean percentage error)
  • MAPE (mean absolute percentage error)
  • MASE (mean absolute scaled error)
  • ACF1 (autocorrelation of errors at lag 1)

To perform a complete evaluation, what we did is to calculate RMSE for all the models we estimated. Then, we compared and picked up the ones with smaller RMSE.

For more information on using the forecast R package, refer to the following URL:

https://cran.r-project.org/web/packages/forecast/forecast.pdf

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.249.252