3.3. Estimation of Logistic Models for Two or More Observations Per Person

When individuals in the sample have three or more observations, we can't use the simple method of doing a logistic regression on the persons who change (with difference scores as predictors). In chapter 2, we faced the same problem with linear models, and we solved it by expressing all variables as deviations from the person-specific means. In the case of dichotomous outcomes, an analogous method can be implemented with PROC LOGISTIC (in SAS 9.0 and later).[] Before proceeding to the practical details, I first need to clear up two theoretical issues: (1) the reason why we can't use dummy variables for individuals in logistic regression, and (2) the rationale for conditional logistic regression.

[] Conditional logistic regression requires the STRATA statement, which was first implemented in SAS 9.0. For earlier releases, conditional logistic regression can be accomplished with PROC PHREG using the methods described in Allison (1999).

In chapter 2, we saw that one way to estimate a fixed effects linear model in the multiple observation case was to structure the data with one observation per individual per occasion and then compute an OLS regression with dummy variables for all individuals (except one). Although computationally cumbersome, that method produced the correct results. We just saw, however, that the device of using dummy variables does not work for logistic regression in the two-occasion case, and the problem extends to data with more than two occasions. The coefficients are generally biased upward, and the test statistics will also be incorrect. Why is this?

This is an example of a general problem called the incidental parameters problem (Kalbfleisch and Sprott 1970) that arises in certain applications of maximum likelihood estimation. The justification for maximum likelihood estimators is usually asymptotic, which means that it's based on how the estimators behave as the sample gets large. However, the validity of that justification depends on the presumption that the number of parameters remains constant as the sample gets larger. For longitudinal data, that works just fine if the number of individuals remains constant while the number of observations per individual gets larger. But if the number of individuals is getting larger while the number of time points remains constant, then the number of parameters in a fixed effects model (including coefficients of the dummy variables) is increasing at the same rate as the sample size. This is not a problem for linear models and (somewhat surprisingly) for the Poisson models discussed in chapter 4. But it is a serious problem with logistic regression and many other nonlinear regression models. The biases are greatest when, as in the previous section, the number of time points per individual is small.

The solution to the incidental parameters problem is to do conditional maximum likelihood (Chamberlain 1980), which we already employed in the two-occasion case. Now we need to generalize that method to more than two occasions. The basic idea is to reformulate the likelihood function so that it no longer contains the individual-specific αi parameters in equation (3.1). It turns out that for the logistic model of equation (3.1), there are reduced sufficient statistics for the αi parameters. That means that there are summaries of the data that contain all the information about the αi terms. Specifically, the reduced sufficient statistics are the counts si of the number of observations for each person in which yit = 1:


We can remove the αi terms from the likelihood function by conditioning on these sufficient statistics. For a single individual, the likelihood function for conventional logistic regression is


We then condition on si by dividing this likelihood by the probability of observing si. Without going through the algebraic details, this produces


In the denominator, ψ2,...,ψQ all have the same form as ψ1 except that, instead of the observed values of 1 and 0 for yit, the 1's and 0's are permuted in all possible ways. Thus Q is the number of different ways of re-arranging the 1's and 0's, given that we've observed a certain number of each. Notice that γ, the coefficient vector for the time-invariant predictors, and the αi terms no longer appear in this likelihood.

To put this more concretely, suppose that each person is observed on five occasions (as in our upcoming example). Suppose, further, that one particular individual was in poverty at times 1 and 3, but not at times 2, 4 or 5. We then ask the question "Given that the event occurred on two occasions, what is the probability that it happened at times 1 and 3, but not at two other times (say, 2 and 5, or 3 and 4)?" In fact, there are 10 different possible ways of choosing two occasions from among five possibilities. The resulting likelihood for this person has the following form:


The conditional likelihood for the whole sample is just the product of all such likelihoods for each person.

There are three things worth noting about conditional logistic regression. First, as we saw in the two-occasion case, persons who don't change on yit over the period of observation are effectively eliminated from the analysis. In the likelihood (3.4), for a person who has all 1's or all 0's, the numerator and denominator will be identical and hence will cancel. With respect to the conditioning argument, if we know that someone has events on five out of five occasions, then there's no more room for variability in when those events occurred.

A second point, related to the first, is that conditional maximum likelihood estimators have two out of the three properties usually associated with maximum likelihood estimation. They are consistent (i.e., they converge in probability to the true values) and they are asymptotically normal (i.e., the sampling distribution is approximately normal for large samples). But they might not be fully efficient. There is a potential loss of information that comes from (a) excluding persons who don't change, and (b) only using within-person variation. But that's the price one always pays for choosing a fixed effects model.

A third point is that, for dichotomous dependent variables, conditional likelihood only works for the logistic regression model, not for other link functions like probit or complementary log-log. That's because those models do not have reduced sufficient statistics for the αi parameters and thus have no way to condition them out of the likelihood function. However, for alternative link functions, it's possible to do approximate conditional likelihood using the projected score method proposed by Waterman and Lindsay (1996).

So much for the theory. How can we implement conditional logistic regression in PROC LOGISTIC? In section 3.2, we estimated a conditional logistic regression model for poverty in years 1 and 5 of a five-year series. Now let's look at all five years together. Again, the first thing we must do is restructure the data so that there is one record per person-year instead of one record per person:

DATA teenyrs5;
   SET my.teenpov;
   ARRAY pv(*) pov1-pov5;
   ARRAY mot(*) mother1-mother5;
   ARRAY spo(*) spouse1-spouse5;
   ARRAY ins(*) inschool1-inschool5;
   ARRAY hou(*) hours1-hours5;
   DO year=1 TO 5;
      pov=pv(year);
      mother=mot(year);
      spouse=spo(year);
      inschool=ins(year);
      hours=hou(year);
      OUTPUT;
   END;
   KEEP id year black age pov mother spouse inschool hours;
RUN;

This DATA step produces 5755 observations, five for each of the 1151 girls. Now we're ready to run PROC LOGISTIC to estimate the first model:

PROC LOGISTIC DATA=teenyrs5 DESC;
   CLASS year / PARAM=REF;
   MODEL pov = year mother spouse inschool hours;
   STRATA id;
RUN;

The CLASS statement declares YEAR to be a categorical variable, with the highest year (year 5) being the reference category. The STRATA statement says that each girl is a separate stratum, which groups together the five observations for each girl in the process of constructing the likelihood function.

Results in Output 3.7 are rather similar to those in Output 3.4, which was based on only two observations per person. The first panel, "Strata Summary," gives the number of girls (strata) who have specific frequencies of years in poverty. Note that there were 232 girls who were not in poverty in any of the five years and 92 girls who were in poverty all five years. Both of these groups get eliminated from the likelihood function. The second panel, "Testing Global Null Hypothesis," gives three alternative chi-square tests for the null hypothesis that all the regression coefficients are 0. Clearly that hypothesis must be rejected. Turning to the "Analysis of Maximum of Likelihood Estimates," we see that motherhood and school enrollment increase the risk of poverty, whereas living with a husband and working more hours reduce the risk. The last panel gives the odds ratios. Motherhood increases the odds of poverty by an estimated 79%. Living with a husband cuts the odds approximately in half. Each additional hour of employment per week reduces the odds by about 2%. Keep in mind that these estimates control for all stable characteristics of the girls, including such things as race, intelligence, place of birth, and parent's education.

Table 3.8. Output 3.7 Conditional Logistic Regression Estimates Produced by PROC LOGISTIC
Strata Summary
 pov  
Response Pattern10Numberof StrataFrequency
1052321160
2143551775
323191955
432152760
541129645
65092460
Testing Global Null Hypothesis: BETA=0
TestChi-SquareDFPr>ChiSq
Likelihood Ratio97.28148<.0001
Score94.58048<.0001
Wald90.56408<.0001
Analysis of Maximum Likelihood Estimates
Parameter DFEstimateStandard ErrorWald Chi-SquarePr>ChiSq
year11−0.40250.12759.96150.0016
year21−0.07070.11850.35620.5506
year31−0.06750.10960.37930.5380
year410.03030.10470.08360.7725
mother 10.58240.159613.32040.0003
spouse 1−0.74780.175318.1856<.0001
inschool 10.27190.11275.81570.0159
hours 1−0.01960.0031538.8887<.0001
Odds Ratio Estimates
EffectPoint Estimate95% Wald ConfidenceLimits
year 1 vs 50.6690.5210.859
year 2 vs 50.9320.7391.175
year 3 vs 50.9350.7541.159
year 4 vs 51.0310.8401.265
mother1.7901.3102.448
spouse0.4730.3360.668
inschool1.3121.0521.637
hours0.9810.9750.987

Although models like this cannot include the main effects of time-invariant variables, they do allow for interactions between time-invariant variables and time-varying variables, including time itself. The next model, for example, includes the interaction between MOTHER and BLACK.

PROC LOGISTIC DATA=teenyrs5 DESC;
   CLASS year / PARAM=REF;
   MODEL pov = year mother spouse inschool hours mother*black;
   STRATA id;
RUN;

In Output 3.8, we see that the interaction is statistically significant at the .05 level. For nonblack girls, the effect of motherhood is to increase the odds[] of poverty by a factor of exp(.9821) = 2.67. For black girls, on the other hand, the effect of motherhood is to increase the odds of poverty by a factor of exp(.9821 − .5989) = 1.47. Thus, motherhood has a larger effect on poverty status among nonblack girls than among black girls.

[] By default, PROC LOGISTIC does not report odds ratios for variables involved in an interaction. However, these can be requested with the EXPB option on the MODEL statement.

Table 3.9. Output 3.8 Conditional Logistic Regression with Interaction
Analysis of Maximum Likelihood Estimates
Parameter DFEstimateStandard ErrorWald Chi-SquarePr>ChiSq
year11−0.39960.12769.80460.0017
year21−0.06770.11860.32600.5680
year31−0.06540.10970.35520.5512
year410.03040.10470.08430.7716
mother 10.98210.252915.07870.0001
spouse 1−0.78300.177719.4224<.0001
inschool 10.26710.11285.60840.0179
hours 1−0.01920.0031636.9396<.0001
mother*black 1−0.59890.28974.27480.0387

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.204.0