In the last section, we completed our model estimation. Now, it is the time for us to evaluate these estimated models to check whether they fit the city's criterions so that we can either move to the results explanation or go back to some previous stages to refine our predictive models.
To perform our model evaluation, in this section, we will mainly use root mean square error (RMSE) to assess our models for both the regression and time series models. While other measures, such as MSE, can also be used to assess models, as an exercise, we will focus on RMSE as the processes of using other measures are similar.
When working on this real-life project, as mentioned in the Methods of service forecasting section of this chapter, we also used decision tree and random forest models, for which we should use a confusion matrix and error ratios to evaluate. Here, we will not discuss these model evaluation methods as they are used a few times in the previous chapters, such as in Chapter 4, Fraud Detection on Spark.
Similarly to model estimation, to calculate RMSE, we need to use MLlib for regression modeling to be implemented with Zeppeline notebooks on Spark. For time series modeling, we will use R notebooks to be implemented in the Databricks environment of Spark.
In MLlib, we can use the following code to calculate RMSE:
val valuesAndPreds = test.map { point => val prediction = new_model.predict(point.features) val r = (point.label, prediction) r } val residuals = valuesAndPreds.map {case (v, p) => math.pow((v - p), 2)} val MSE = residuals.mean(); val RMSE = math.pow(MSE, 0.5)
Besides the preceding, MLlib also has some functions in the RegressionMetrics
and RankingMetrics
classes for us to use for RMSE calculation.
In R, the forecast
package has an accuracy
function that can be used to calculate forecasting accuracy as well as RMSE. Take a look at the following:
accuracy(f, x, test=NULL, d=NULL, D=NULL)
The measures calculated are:
To perform a complete evaluation, what we did is to calculate RMSE for all the models we estimated. Then, we compared and picked up the ones with smaller RMSE.
For more information on using the forecast
R package, refer to the following URL:
https://cran.r-project.org/web/packages/forecast/forecast.pdf
3.149.249.252