Model evaluation

Our linear regression model has now been successfully trained. Remember that we separated some data from our dataset for testing, which we intend to use to find the accuracy of the model. We will be using that to assess the efficiency of our model. R2-statistics is a common method of measuring the accuracy of regression models:

  1. R2 can be determined using our test dataset in the LinearRegression.score() method:
#check prediction score/accuracy
regressor.score(X_test, y_test)

The output of this score() function is as follows:

0.5383003344910231

The score(y_test, y_pred) method predicts the Y values for an input set, X, and compares them against the true Y values. The value of R2 is generally between 0 and 1. The closer the value of R2 to 1, the more accurate the model is. Here, the R2 score is 0.53 ≈ 53% accuracy, which is okay. With more than one independent variable, we will improve the performance of our model, which we will be looking at next.

  1. Before that, let's predict the y values with our model and evaluate it more. And a target variables DataFrame is also built:
# predict the y values
y_pred=regressor.predict(X_test)
# a data frame with actual and predicted values of y
evaluate = pd.DataFrame({'Actual': y_test.values.flatten(), 'Predicted': y_pred.flatten()})
evaluate.head(10)

The target variables DataFrame is as follows:

Figure 9.4: The first 10 entries showing the actual values and the predicted values

The preceding screenshot shows the difference between the actual values and the predicted values. We can see them if we plot them:

evaluate.head(10).plot(kind = 'bar')

The output of the preceding code is as follows:

Figure 9.5: Stacked bar plot showing the actual values and the predicted values

Much easier to understand, right? Note that most of the predicted values are lower than the actual values. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.50.252