Logistic regression

So far, we have discussed linear regression models, an appropriate method to model continuous response variables. However, non-continuous, binary responses (such as being ill or healthy, being faithful or deciding to switch to a new job, mobile supplier or partner) are also very common. The main difference compared to the continuous case is that now we should rather model probability instead of the expected value of the response variable.

The naive solution would be to use the probability as outcome in a linear model. But the problem with this solution is that the probability should be always between 0 and 1, and this bounded range is not guaranteed at all when using a linear model. A better solution is to fit a logistic regression model, which models not only the probability but also the natural logarithm of the odds, called the logit. The logit can be any (positive or negative) number, so the problem of limited range is eliminated.

Let's have a simple example of predicting the probability of the death penalty, using some information on the race of the defendant. This model relates to the much more complicated issue of racism in the infliction of the death penalty, a question with a long history in the USA. We will use the deathpenalty dataset from the catdata package about the judgment of defendants in cases of multiple murders in Florida between 1976 and 1987. The cases are classified with respect to the death penalty (where 0 refers to no, 1 to yes), the race of the defendant, and the race of the victim (black is referred as 0, white is 1).

First, we expand the frequency table into case form via the expand.dtf function from the vcdExtra package, then we fit our first generalized model in the dataset:

> library(catdata)
> data(deathpenalty)
> library(vcdExtra)
> deathpenalty.expand <- expand.dft(deathpenalty)
> binom.model.0 <- glm(DeathPenalty ~ DefendantRace,
+   data = deathpenalty.expand, family = binomial)
> summary(binom.model.0)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.4821  -0.4821  -0.4821  -0.4044   2.2558  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -2.4624     0.2690  -9.155   <2e-16 ***
DefendantRace   0.3689     0.3058   1.206    0.228    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 440.84  on 673  degrees of freedom
Residual deviance: 439.31  on 672  degrees of freedom
AIC: 443.31

Number of Fisher Scoring iterations: 5

The regression coefficient is statistically not significant, so at first sight, we can't see a racial bias in the data. Anyway, for didactic purposes, let's interpret the regression coefficient. It's 0.37, which means that the natural logarithm of the odds of getting a death penalty increases by 0.37 when moving from the black category to the white one. This difference is easily interpretable if you take its exponent, which is the ratio of the odds:

> exp(cbind(OR = coef(binom.model.0), confint(binom.model.0)))
                      OR      2.5 %    97.5 %
(Intercept)   0.08522727 0.04818273 0.1393442
DefendantRace 1.44620155 0.81342472 2.7198224

The odds ratio pertaining to the race of the defendant is 1.45, which means that white defendants have 45 percent larger odds of getting the death penalty than black defendants.

Note

Although R produces this, the odds ratio for the intercept is generally not interpreted.

We can say something more general. We have seen that in linear regression models, the regression coefficient, b, can be interpreted as a one unit increase in X increases Y by b. But, in logistic regression models, a one unit increase in X multiplies the odds of Y by exp(b).

Please note that the preceding predictor was a discrete one, with values of 0 (black) and 1 (white), so it's basically a dummy variable for white, and black is the reference category. We have seen the same solution for entering discrete variables in the case of linear regression models. If you have more than two racial categories, you should define a second dummy for the third race and enter it into the model as well. The exponent of each dummy variables' coefficients equal to the odds ratio, which compares the given category to the reference. If you have a continuous predictor, the exponent of the coefficient equals to the odds ratio pertaining to a one unit increase in the predictor.

Now, let's enter the race of the victim into the examination, since it's a plausible confounder. Let's control for it, and fit the logistic regression model with both the DefendantRace and VictimRace as predictors:

> binom.model.1 <- update(binom.model.0, . ~ . + VictimRace)
> summary(binom.model.1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.7283  -0.4899  -0.4899  -0.2326   2.6919  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -3.5961     0.5069  -7.094 1.30e-12 ***
DefendantRace  -0.8678     0.3671  -2.364   0.0181 *  
VictimRace      2.4044     0.6006   4.003 6.25e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 440.84  on 673  degrees of freedom
Residual deviance: 418.96  on 671  degrees of freedom
AIC: 424.96

Number of Fisher Scoring iterations: 6

> exp(cbind(OR = coef(binom.model.1), confint(binom.model.1)))
                       OR       2.5 %      97.5 %
(Intercept)    0.02743038 0.008433309  0.06489753
DefendantRace  0.41987565 0.209436976  0.89221877
VictimRace    11.07226549 3.694532608 41.16558028

When controlling for VictimRace, the effect of DefendantRace becomes significant! The odds ratio is 0.42, which means that white defendants' odds of getting the death penalty are only 42 percent of the odds of black defendants, holding the race of the victim fixed. Also, the odds ratio of VictimRace (11.07) shows an extremely strong effect: killers of white victims are 11 times more likely to get a death penalty than killers of black victims.

So, the effect of DefendantRace is exactly the opposite of what we have got in the one-predictor model. The reversed association may seem to be paradoxical, but it can be explained. Let's have a look at the following output:

> prop.table(table(factor(deathpenalty.expand$VictimRace,
+              labels = c("VictimRace=0", "VictimRace=1")),
+            factor(deathpenalty.expand$DefendantRace, 
+              labels = c("DefendantRace=0", "DefendantRace=1"))), 1)
           
               DefendantRace=0 DefendantRace=1
  VictimRace=0      0.89937107      0.10062893
  VictimRace=1      0.09320388      0.90679612

The data seems to be homogeneous in some sense: black defendants are more likely to have black victims, and vice versa. If you put these pieces of information together, you start to see that black defendants yield a smaller proportion of death sentences just because they are more likely to have black victims, and those who have black victims are less likely to get a death penalty. The paradox disappears: the crude death penalty and DefendantRace association was confounded by VictimRace.

To sum it up, it seems that taking the available information into account, you can come to the following conclusions:

  • Black defendants are more likely to get the death penalty
  • Killing a white person is considered to be a more serious crime than killing a black person

Of course, you should draw such conclusions extremely carefully, as the question of racial bias needs a very thorough analysis using all the relevant information regarding the circumstances of the crime, and much more.

Data considerations

Logistic regression models work on the assumption that the observations are totally independent from each other. This assumption is violated, for example, if your observations are consecutive years. The deviance residuals and other diagnostic statistics can help validate the model and detect problems such as the misspecification of the link function. For further reference, see the LogisticDx package.

As a general rule of thumb, logistic regression models require at least 10 events per predictors, where an event denotes the observations belonging to the less frequent category in the response. In our death penalty example, death is the less frequent category in the response, and we have 68 death sentences in the database. So, the rule suggests that a maximum of 6-7 predictors are allowed.

The regression coefficients are estimated using the maximum likelihood method. Since there is no closed mathematical form to get these ML estimations, R uses an optimization algorithm instead. In some cases, you may get an error message that the algorithm doesn't reach convergence. In such cases, it is unable to find an appropriate solution. This may occur for a number of reasons, such as having too many predictors, too few events, and so on.

Goodness of model fit

One measure of model fit, to evaluate the performance of the model, is the significance of the overall model. The corresponding likelihood ratio tests whether the given model fits significantly better than a model with just an intercept, which we call the null model.

To obtain the test results, you have to look at the residual deviance in the output. It measures the disagreement between the maxima of the observed and the fitted log likelihood functions.

Note

Since logistic regression follows the maximal likelihood principle, the goal is to minimize the sum of the deviance residuals. Therefore, this residual is parallel to the raw residual in linear regression, where the goal is to minimize the sum of squared residuals.

The null deviance represents how well the response is predicted by a model with nothing but an intercept. To judge the model, you have to compare the residual deviance to the null deviance; the difference follows a chi-square distribution. The corresponding test is available in the lmtest package:

> library(lmtest)
> lrtest(binom.model.1)
Likelihood ratio test

Model 1: DeathPenalty ~ DefendantRace + VictimRace
Model 2: DeathPenalty ~ 1
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1   3 -209.48                         
2   1 -220.42 -2 21.886  1.768e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p value indicates a highly significant decrease in deviance. This means that the model is significant, and the predictors have a significant effect on the response probability.

You can think of the likelihood ratio as the F-test in the linear regression models. It reveals if the model is significant, but it doesn't tell anything about the goodness-of-fit, which was described by the adjusted R-squared measure in the linear case.

An equivalent statistic for logistic regression models does not exist, but several pseudo R-squared have been developed. These usually range from 0 to 1 with higher values indicating a better fit. We will use the PseudoR2 function from the BaylorEdPsych package to compute this value:

> library(BaylorEdPsych)
> PseudoR2(binom.model.1)
        McFadden     Adj.McFadden        Cox.Snell       Nagelkerke 
      0.04964600       0.03149893       0.03195036       0.06655297
McKelvey.Zavoina           Effron            Count        Adj.Count 
      0.15176608       0.02918095               NA               NA 
             AIC    Corrected.AIC 
    424.95652677     424.99234766  

But be careful, the pseudo R-squared cannot be interpreted as an OLS R-squared, and there are some documented problems with them as well, but they give us a rough picture. In our case, they say that the explanative power of the model is rather low, which is not surprising if we consider the fact that only two predictors were used in the modeling of such a complex process as judging a crime.

Model comparison

As we have seen in the previous chapter, the adjusted R-squared provides a good base for model comparison when dealing with nested linear regression models. For nested logistic regression models, you can use the likelihood ratio test (such as the lrtest function from the lmtest library), which compares the difference between the residual deviances.

> lrtest(binom.model.0, binom.model.1)
Likelihood ratio test

Model 1: DeathPenalty ~ DefendantRace
Model 2: DeathPenalty ~ DefendantRace + VictimRace
  #Df  LogLik Df Chisq Pr(>Chisq)    
1   2 -219.65                        
2   3 -209.48  1 20.35   6.45e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note

LogLiK, in the preceding output denotes the log-likelihood of the model; you got the residual deviance by multiplying it by 2.

For un-nested models, you can use AIC, just like we did in the case of linear regression models, but in logistic regression models, AIC is part of the standard output, so there is no need to call the AIC function separately. Here, the binom.model.1 has a lower AIC than binom.model.0, and the difference is not negligible since it is greater than 2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.175.253