Our linear regression model has now been successfully trained. Remember that we separated some data from our dataset for testing, which we intend to use to find the accuracy of the model. We will be using that to assess the efficiency of our model. R2-statistics is a common method of measuring the accuracy of regression models:
- R2 can be determined using our test dataset in the LinearRegression.score() method:
#check prediction score/accuracy
regressor.score(X_test, y_test)
The output of this score() function is as follows:
0.5383003344910231
The score(y_test, y_pred) method predicts the Y values for an input set, X, and compares them against the true Y values. The value of R2 is generally between 0 and 1. The closer the value of R2 to 1, the more accurate the model is. Here, the R2 score is 0.53 ≈ 53% accuracy, which is okay. With more than one independent variable, we will improve the performance of our model, which we will be looking at next.
- Before that, let's predict the y values with our model and evaluate it more. And a target variables DataFrame is also built:
# predict the y values
y_pred=regressor.predict(X_test)
# a data frame with actual and predicted values of y
evaluate = pd.DataFrame({'Actual': y_test.values.flatten(), 'Predicted': y_pred.flatten()})
evaluate.head(10)
The target variables DataFrame is as follows:
The preceding screenshot shows the difference between the actual values and the predicted values. We can see them if we plot them:
evaluate.head(10).plot(kind = 'bar')
The output of the preceding code is as follows:
Much easier to understand, right? Note that most of the predicted values are lower than the actual values.