Choosing the right regression metric

Evaluation for regression can be done in similar detail as we did for classification. In Chapter 3, First Steps in Supervised Learning, we also talked about some fundamental metrics for regression:

  • Mean squared error: The most commonly used error metric for regression problems is to measure the squared error between the predicted and true target value for every data point in the training set, averaged across all data points (sklearn.metrics.mean_squared_error).
  • Explained variance: A more sophisticated metric is to measure to what degree a model can explain the variation or dispersion of the test data (sklearn.metrics.explained_variance_score). Often, the amount of explained variance is measured using the correlation coefficient.
  • The R2 score (pronounced R squared): This is closely related to the explained variance score, but it uses an unbiased variance estimation (sklearn.metrics.r2_score). It is also known as the coefficient of determination.

In most applications we have encountered so far, using the default R2 score is enough. But feel free to try other regression metrics such as mean squared error and explained variance to understand how each metric evaluates the results.

As we combine elaborate grid searchers with sophisticated evaluation metrics, our model selection code might become increasingly complex. Fortunately, scikit-learn offers a way to simplify model selection with a helpful construct known as a pipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.218.45