We must be careful when interpreting the coefficients of a logistic regression. Interpretation is not as straightforward as with linear models, which we looked at in Chapter 3, Modeling with Linear Regression. Using the logistic inverse link function introduces a non-linearity that we have to take into account. If is positive, increasing will increase by some amount, but the amount is not a linear function of ; instead, it depends non-linearly on the value of . We can visualize this fact in Figure 4.4; instead of a line with a constant slope, we have an S-shaped line with a slope that changes as a function of . A little bit of algebra can give us some further insight into how much changes with :
The basic logistic model is:
The inverse of the logistic is the logit function, which is:
Thus, if we take the first equation in this section and apply the logit function to both terms, we get this equation:
Or equivalently:
Remember that in our model is :
The quantity is known as the odds.
The odds of success are defined as the ratio of the probability of success over the probability of not-success. While the probability of getting 2 by rolling a fair die is 1/6, the odds for the same event are or one favorable event to five unfavorable events. Odds are often used by gamblers mainly because odds provide a more intuitive tool than raw probabilities when thinking about the proper way to bet.
The transformation from probability to odds is a monotonic transformation, meaning the odds increase as the probability increases, and the other way around. While probabilities are restricted to the [0, 1] interval, odds live in the [0, ∞) interval. The logarithm is another monotonic transformation and log-odds are in the (-∞, ∞) interval. Figure 4.6 shows how probabilities are related to odds and log-odds:
probability = np.linspace(0.01, 1, 100)
odds = probability / (1 - probability)
_, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(probability, odds, 'C0')
ax2.plot(probability, np.log(odds), 'C1')
ax1.set_xlabel('probability')
ax1.set_ylabel('odds', color='C0')
ax2.set_ylabel('log-odds', color='C1')
ax1.grid(False)
ax2.grid(False)
Thus, the values of the coefficients provided by summary are in the log-odds scale:
df = az.summary(trace_1, var_names=varnames)
df
mean |
sd |
mc error |
hpd 3% |
hpd 97% |
eff_n |
r_hat |
|
α |
-9.12 |
4.61 |
0.15 |
-17.55 |
-0.42 |
1353.0 |
1.0 |
β[0] |
4.65 |
0.87 |
0.03 |
2.96 |
6.15 |
1268.0 |
1.0 |
β[1] |
-5.16 |
0.95 |
0.01 |
-7.05 |
-3.46 |
1606.0 |
1.0 |
One very pragmatic way of understanding models is to change parameters and see what happens. In the following block of code, we are computing the log-odds in favor of versicolor as , and then the probability of versicolor with the logistic function. Then, we repeat the computation by fixing and increasing by 1:
x_1 = 4.5 # sepal_length
x_2 = 3 # sepal_width
log_odds_versicolor_i = (df['mean'] * [1, x_1, x_2]).sum()
probability_versicolor_i = logistic(log_odds_versicolor_i)
log_odds_versicolor_f = (df['mean'] * [1, x_1 + 1, x_2]).sum()
probability_versicolor_f = logistic(log_odds_versicolor_f)
log_odds_versicolor_f - log_odds_versicolor_i, probability_versicolor_f - probability_versicolor_i
If you run the code, you will find that the increase in log-odds is , which is exactly the value of (check the summary of trace_1). This is in line with our previous finding that encodes the increase in log-odds units by unit increase of the variable. The increase in probability is .