Binary classification with logistic regression

Instead of attempting to predict what the total first-day return will be, we are going to attempt to predict whether the IPO will be one we should buy for a trade or not. It is here that we should point out that this is not investment advice and is for illustrative purposes only. Please don't run out and start day trading IPOs with this model willy-nilly. It will end badly.

Now, to predict a binary outcome (that's a 1 or 0/yes or no), we will start with a model called logistic regression. Logistic regression is actually a binary classification model rather than regression. But it does utilize the typical form of a linear regression; it just does so within a logistic function.

A typical single variable regression model takes the following form:

Here, t is a linear function of a single explanatory variable, x. This can, of course, be expanded to be a linear combination of many variables. The problem with this form for a binary outcome variable is that t does not naturally fall between 1 and 0.

The logistic function seen in the following equation has some quite favorable mathematical properties, including the fact that it can take any number as an input (t here) and return a result that falls between 0 and 1:

The graph is represented as below:

By replacing t with our regression function, we now have a model that is able to both give us information on the importance of each predictor (the beta coefficients) and provide a form that can be used to give us a binary prediction that represents the probability of success, or a positive result:

Before we can move on to modeling our data, we need to put it in a form that is appropriate for scikit-learn.

We'll start by importing a library that can help us with this task; it's called patsy. It can be pip-installed if necessary:

from patsy import dmatrix 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.93.64