Estimation of Logistic Models for Two or More Observations Per Person

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3.3. Estimation of Logistic Models for Two or More Observations Per Person

When individuals in the sample have three or more observations, we can't use the simple method of doing a logistic regression on the persons who change (with difference scores as predictors). In chapter 2, we faced the same problem with linear models, and we solved it by expressing all variables as deviations from the person-specific means. In the case of dichotomous outcomes, an analogous method can be implemented with PROC LOGISTIC (in SAS 9.0 and later).^[] Before proceeding to the practical details, I first need to clear up two theoretical issues: (1) the reason why we can't use dummy variables for individuals in logistic regression, and (2) the rationale for conditional logistic regression.

^[] Conditional logistic regression requires the STRATA statement, which was first implemented in SAS 9.0. For earlier releases, conditional logistic regression can be accomplished with PROC PHREG using the methods described in Allison (1999).

In chapter 2, we saw that one way to estimate a fixed effects linear model in the multiple observation case was to structure the data with one observation per individual per occasion and then compute an OLS regression with dummy variables for all individuals (except one). Although computationally cumbersome, that method produced the correct results. We just saw, however, that the device of using dummy variables does not work for logistic regression in the two-occasion case, and the problem extends to data with more than two occasions. The coefficients are generally biased upward, and the test statistics will also be incorrect. Why is this?

This is an example of a general problem called the incidental parameters problem (Kalbfleisch and Sprott 1970) that arises in certain applications of maximum likelihood estimation. The justification for maximum likelihood estimators is usually asymptotic, which means that it's based on how the estimators behave as the sample gets large. However, the validity of that justification depends on the presumption that the number of parameters remains constant as the sample gets larger. For longitudinal data, that works just fine if the number of individuals remains constant while the number of observations per individual gets larger. But if the number of individuals is getting larger while the number of time points remains constant, then the number of parameters in a fixed effects model (including coefficients of the dummy variables) is increasing at the same rate as the sample size. This is not a problem for linear models and (somewhat surprisingly) for the Poisson models discussed in chapter 4. But it is a serious problem with logistic regression and many other nonlinear regression models. The biases are greatest when, as in the previous section, the number of time points per individual is small.

The solution to the incidental parameters problem is to do conditional maximum likelihood (Chamberlain 1980), which we already employed in the two-occasion case. Now we need to generalize that method to more than two occasions. The basic idea is to reformulate the likelihood function so that it no longer contains the individual-specific α_i parameters in equation (3.1). It turns out that for the logistic model of equation (3.1), there are reduced sufficient statistics for the α_i parameters. That means that there are summaries of the data that contain all the information about the α_i terms. Specifically, the reduced sufficient statistics are the counts s_i of the number of observations for each person in which y_it = 1:

We can remove the α_i terms from the likelihood function by conditioning on these sufficient statistics. For a single individual, the likelihood function for conventional logistic regression is

We then condition on s_i by dividing this likelihood by the probability of observing s_i. Without going through the algebraic details, this produces

In the denominator, ψ₂,...,ψ_Q all have the same form as ψ₁ except that, instead of the observed values of 1 and 0 for y_it, the 1's and 0's are permuted in all possible ways. Thus Q is the number of different ways of re-arranging the 1's and 0's, given that we've observed a certain number of each. Notice that γ, the coefficient vector for the time-invariant predictors, and the α_i terms no longer appear in this likelihood.

To put this more concretely, suppose that each person is observed on five occasions (as in our upcoming example). Suppose, further, that one particular individual was in poverty at times 1 and 3, but not at times 2, 4 or 5. We then ask the question "Given that the event occurred on two occasions, what is the probability that it happened at times 1 and 3, but not at two other times (say, 2 and 5, or 3 and 4)?" In fact, there are 10 different possible ways of choosing two occasions from among five possibilities. The resulting likelihood for this person has the following form:

The conditional likelihood for the whole sample is just the product of all such likelihoods for each person.

There are three things worth noting about conditional logistic regression. First, as we saw in the two-occasion case, persons who don't change on y_it over the period of observation are effectively eliminated from the analysis. In the likelihood (3.4), for a person who has all 1's or all 0's, the numerator and denominator will be identical and hence will cancel. With respect to the conditioning argument, if we know that someone has events on five out of five occasions, then there's no more room for variability in when those events occurred.

A second point, related to the first, is that conditional maximum likelihood estimators have two out of the three properties usually associated with maximum likelihood estimation. They are consistent (i.e., they converge in probability to the true values) and they are asymptotically normal (i.e., the sampling distribution is approximately normal for large samples). But they might not be fully efficient. There is a potential loss of information that comes from (a) excluding persons who don't change, and (b) only using within-person variation. But that's the price one always pays for choosing a fixed effects model.

A third point is that, for dichotomous dependent variables, conditional likelihood only works for the logistic regression model, not for other link functions like probit or complementary log-log. That's because those models do not have reduced sufficient statistics for the α_i parameters and thus have no way to condition them out of the likelihood function. However, for alternative link functions, it's possible to do approximate conditional likelihood using the projected score method proposed by Waterman and Lindsay (1996).

So much for the theory. How can we implement conditional logistic regression in PROC LOGISTIC? In section 3.2, we estimated a conditional logistic regression model for poverty in years 1 and 5 of a five-year series. Now let's look at all five years together. Again, the first thing we must do is restructure the data so that there is one record per person-year instead of one record per person:

DATA teenyrs5;
   SET my.teenpov;
   ARRAY pv(*) pov1-pov5;
   ARRAY mot(*) mother1-mother5;
   ARRAY spo(*) spouse1-spouse5;
   ARRAY ins(*) inschool1-inschool5;
   ARRAY hou(*) hours1-hours5;
   DO year=1 TO 5;
      pov=pv(year);
      mother=mot(year);
      spouse=spo(year);
      inschool=ins(year);
      hours=hou(year);
      OUTPUT;
   END;
   KEEP id year black age pov mother spouse inschool hours;
RUN;

This DATA step produces 5755 observations, five for each of the 1151 girls. Now we're ready to run PROC LOGISTIC to estimate the first model:

PROC LOGISTIC DATA=teenyrs5 DESC;
   CLASS year / PARAM=REF;
   MODEL pov = year mother spouse inschool hours;
   STRATA id;
RUN;

The CLASS statement declares YEAR to be a categorical variable, with the highest year (year 5) being the reference category. The STRATA statement says that each girl is a separate stratum, which groups together the five observations for each girl in the process of constructing the likelihood function.

Results in Output 3.7 are rather similar to those in Output 3.4, which was based on only two observations per person. The first panel, "Strata Summary," gives the number of girls (strata) who have specific frequencies of years in poverty. Note that there were 232 girls who were not in poverty in any of the five years and 92 girls who were in poverty all five years. Both of these groups get eliminated from the likelihood function. The second panel, "Testing Global Null Hypothesis," gives three alternative chi-square tests for the null hypothesis that all the regression coefficients are 0. Clearly that hypothesis must be rejected. Turning to the "Analysis of Maximum of Likelihood Estimates," we see that motherhood and school enrollment increase the risk of poverty, whereas living with a husband and working more hours reduce the risk. The last panel gives the odds ratios. Motherhood increases the odds of poverty by an estimated 79%. Living with a husband cuts the odds approximately in half. Each additional hour of employment per week reduces the odds by about 2%. Keep in mind that these estimates control for all stable characteristics of the girls, including such things as race, intelligence, place of birth, and parent's education.

Table 3.8. Output 3.7 Conditional Logistic Regression Estimates Produced by PROC LOGISTIC
Strata Summary
	pov
Response Pattern	1	0	Numberof Strata	Frequency
1	0	5	232	1160
2	1	4	355	1775
3	2	3	191	955
4	3	2	152	760
5	4	1	129	645
6	5	0	92	460

Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr>ChiSq
Likelihood Ratio	97.2814	8	<.0001
Score	94.5804	8	<.0001
Wald	90.5640	8	<.0001

Analysis of Maximum Likelihood Estimates
Parameter		DF	Estimate	Standard Error	Wald Chi-Square	Pr>ChiSq
year	1	1	−0.4025	0.1275	9.9615	0.0016
year	2	1	−0.0707	0.1185	0.3562	0.5506
year	3	1	−0.0675	0.1096	0.3793	0.5380
year	4	1	0.0303	0.1047	0.0836	0.7725
mother		1	0.5824	0.1596	13.3204	0.0003
spouse		1	−0.7478	0.1753	18.1856	<.0001
inschool		1	0.2719	0.1127	5.8157	0.0159
hours		1	−0.0196	0.00315	38.8887	<.0001

Odds Ratio Estimates
Effect	Point Estimate	95% Wald Confidence	Limits
year 1 vs 5	0.669	0.521	0.859
year 2 vs 5	0.932	0.739	1.175
year 3 vs 5	0.935	0.754	1.159
year 4 vs 5	1.031	0.840	1.265
mother	1.790	1.310	2.448
spouse	0.473	0.336	0.668
inschool	1.312	1.052	1.637
hours	0.981	0.975	0.987

Although models like this cannot include the main effects of time-invariant variables, they do allow for interactions between time-invariant variables and time-varying variables, including time itself. The next model, for example, includes the interaction between MOTHER and BLACK.

PROC LOGISTIC DATA=teenyrs5 DESC;
   CLASS year / PARAM=REF;
   MODEL pov = year mother spouse inschool hours mother*black;
   STRATA id;
RUN;

In Output 3.8, we see that the interaction is statistically significant at the .05 level. For nonblack girls, the effect of motherhood is to increase the odds^[] of poverty by a factor of exp(.9821) = 2.67. For black girls, on the other hand, the effect of motherhood is to increase the odds of poverty by a factor of exp(.9821 − .5989) = 1.47. Thus, motherhood has a larger effect on poverty status among nonblack girls than among black girls.

^[] By default, PROC LOGISTIC does not report odds ratios for variables involved in an interaction. However, these can be requested with the EXPB option on the MODEL statement.

Table 3.9. Output 3.8 Conditional Logistic Regression with Interaction
Analysis of Maximum Likelihood Estimates
Parameter		DF	Estimate	Standard Error	Wald Chi-Square	Pr>ChiSq
year	1	1	−0.3996	0.1276	9.8046	0.0017
year	2	1	−0.0677	0.1186	0.3260	0.5680
year	3	1	−0.0654	0.1097	0.3552	0.5512
year	4	1	0.0304	0.1047	0.0843	0.7716
mother		1	0.9821	0.2529	15.0787	0.0001
spouse		1	−0.7830	0.1777	19.4224	<.0001
inschool		1	0.2671	0.1128	5.6084	0.0179
hours		1	−0.0192	0.00316	36.9396	<.0001
mother*black		1	−0.5989	0.2897	4.2748	0.0387

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Estimation of Logistic Models for Two or More Observations Per Person

Create new playlist

Sign In

Sign Up

3.3. Estimation of Logistic Models for Two or More Observations Per Person

Table of Contents for
Estimation of Logistic Models for Two or More Observations Per Person