Chapter 3. Logistic Regression

For regression tasks where the goal is to predict a numerical output, such as price or temperature, we've seen that linear regression can potentially be a good starting point. It is simple to train and easy to interpret even though, as a model, it makes strict assumptions about the data and the underlying target function. Before studying more advanced techniques to tackle regression problems, we'll introduce logistic regression. Despite its somewhat misleading name, this is actually our first model for performing classification. As we learned in Chapter 1, Gearing Up for Predictive Modeling, in classification problems, our output is qualitative and is thus comprised of a finite set of values, which we call classes. We'll begin by thinking about the binary classification scenario, where we are trying to distinguish between two classes, which we'll arbitrarily label as 0 and 1, and later on, we'll extend this to distinguishing between multiple classes.

Classifying with linear regression

Even though we know classification problems involve qualitative outputs, it seems natural to ask whether we could use our existing knowledge of linear regression and apply it to the classification setting. We could do this by training a linear regression model to predict a value in the interval [0,1], remembering that we've chosen to label our two classes as 0 and 1. Then, we could apply a threshold to the output of our model in such a way that if the model outputs a value below 0.5, we would predict class 0; otherwise, we would predict class 1. The following graph demonstrates this concept for a simple linear regression with a single input feature X1 and for a binary classification problem. Our output variable y is either 0 or 1, so all the data lies on two horizontal lines. The solid line shows the output of the model, and the dashed line shows the decision boundary, which arises when we put a threshold on the model's predicted output at the value 0.5. Points to the left of the dashed line are predicted as belonging to class 0, and points to the right are predicted as belonging to class 1.

The model is clearly not perfect, but it does seem to correctly classify a large proportion of the data.

Classifying with linear regression

While a good approximation in this case, this approach doesn't feel right for a number of reasons. Firstly, although we know beforehand that our output variable is limited to the interval [0,1] because we have just two classes, the raw output from the linear regression predicts values outside this range. We can see this from the graph for values of input feature X1 that are either very low or very high. Secondly, linear regression is designed to solve the problem of minimizing the MSE, which does not seem appropriate for us in this case. Our goal is really to find a way to separate the two classes, not to minimize the mean squared error against a line of best fit. As a consequence of this fact, the location of the decision boundary is very sensitive to the presence of high leverage points. As we discussed in Chapter 2, Linear Regression, high leverage points are points that lie far away from most of the data because they have extreme values for at least one of their input features.

The following plot demonstrates the effect of high leverage points on our classifier:

Classifying with linear regression

Here, the data is exactly the same as before, except that we have added two new observations for class 1 that have relatively high values for feature X1 and thus appear far to the right of the graph. Now ideally, because these two newly added observations are well into the area of the graph where we predict class 1, they should not impact our decision boundary so heavily. Due to the fact that we are minimizing the MSE, the old linear regression line (shown as a solid line) has now shifted to the right (shown as a dashed line). Consequently, the point at which our new linear regression line crosses 0.5 on the y axis has moved to the right. Thus, our decision boundary has noticeably moved to the right as a result of adding only two new points.

Logistic regression addresses all of these points by providing an output that is bounded by the interval [0,1] and is trained using an entirely different optimization criterion than linear regression, so we are no longer fitting a function by minimizing the MSE, as we'll now see.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.186.109