Chapter 10. Predicting Continuous Target Variables with Regression Analysis

Throughout the previous chapters, you learned a lot about the main concepts behind supervised learning and trained many different models for classification tasks to predict group memberships or categorical variables. In this chapter, we will take a dive into another subcategory of supervised learning: regression analysis.

Regression models are used to predict target variables on a continuous scale, which makes them attractive for addressing many questions in science as well as applications in industry, such as understanding relationships between variables, evaluating trends, or making forecasts. One example would be predicting the sales of a company in future months.

In this chapter, we will discuss the main concepts of regression models and cover the following topics:

  • Exploring and visualizing datasets
  • Looking at different approaches to implement linear regression models
  • Training regression models that are robust to outliers
  • Evaluating regression models and diagnosing common problems
  • Fitting regression models to nonlinear data

Introducing a simple linear regression model

The goal of simple (univariate) linear regression is to model the relationship between a single feature (explanatory variable x) and a continuous valued response (target variable y). The equation of a linear model with one explanatory variable is defined as follows:

Introducing a simple linear regression model

Here, the weight Introducing a simple linear regression model represents the y axis intercepts and Introducing a simple linear regression model is the coefficient of the explanatory variable. Our goal is to learn the weights of the linear equation to describe the relationship between the explanatory variable and the target variable, which can then be used to predict the responses of new explanatory variables that were not part of the training dataset.

Based on the linear equation that we defined previously, linear regression can be understood as finding the best-fitting straight line through the sample points, as shown in the following figure:

Introducing a simple linear regression model

This best-fitting line is also called the regression line, and the vertical lines from the regression line to the sample points are the so-called offsets or residuals—the errors of our prediction.

The special case of one explanatory variable is also called simple linear regression, but of course we can also generalize the linear regression model to multiple explanatory variables. Hence, this process is called multiple linear regression:

Introducing a simple linear regression model

Here, Introducing a simple linear regression model is the y axis intercept with Introducing a simple linear regression model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.7.102