Issues and Extensions

In this section, we look at a number of complications and concerns that arise in the analysis of tied data using maximum likelihood methods.

Dependence among the Observations?

A common reaction to the methods described in this chapter is that there must be something wrong. In general, when multiple observations are created for a single individual, it’s reasonable to suppose that those observations are not independent, thereby violating a basic assumption used to construct the likelihood function. The consequence is thought to be biased standard error estimates and inflated test statistics. Even worse, there are different numbers of observations for different individuals, so some appear to get more weight than others.

While concern about dependence is often legitimate, it is not applicable here. In this case, the creation of multiple observations is not an ad-hoc method; rather, it follows directly from factoring the likelihood function for the data (Allison 1982). The basic idea is this. In its original form, the likelihood for data with no censoring can be written as a product of probabilities over all n observations,

Equation 7.3


where Ti is the random variable and ti is the particular value observed for individual i. Each of the probabilities in equation (7.3) can be factored in the following way. If ti =5, we have

Equation 7.4


where, again, Pit is the conditional probability of an event at time t, given that an event has not already occurred. This factorization follows immediately from the definition of conditional probability. Each of the five terms in equation (7.4) may be treated as though it came from a distinct, independent observation.

For those who may still be unconvinced, a comparison of the standard errors in Output 5.10 with those in Output 7.2 should be reassuring. They are virtually identical, despite the fact that the partial likelihood estimates in Output 5.10 are based on 100 persons, while the maximum likelihood estimates in Output 7.2 are based on 272 person-years.

This lack of dependence holds only when no individual has more than one event. When events are repeatable, as discussed in Chapter 8, “Heterogeneity, Repeated Events, and Other Topics,” there is a real problem of dependence. But the problem is neither more nor less serious than it is for other methods of survival analysis.

Handling Large Numbers of Observations

Although the creation of multiple observations for each individual does not violate any assumptions about independence, it may cause practical problems when the number of individuals is large and the time intervals are small. For example, if you have a sample of 1,000 persons observed at monthly intervals over a five-year period, you could end up with a working data set of nearly 60,000 person-months. While this is not an impossibly large number, it can certainly increase the time you spend waiting for results, thereby inhibiting exploratory data analysis.

If you find yourself in this situation, there are several options you may want to consider:

  • First, you should ask yourself if you really need to work with such small time intervals. If the time-dependent covariates are only changing annually, you might as well switch from person-months to person-years. True, there will be some loss of precision in the estimates, but this is usually minimal. And by switching to the piecewise exponential model described in Chapter 4 (which has a similar data structure), you can even avoid the loss of precision. On the other hand, if the time-dependent covariates are changing at monthly intervals, you really should stick to that level of analysis.

  • Second, if your covariates are all categorical or they at least have a small number of levels, you can achieve great computational economy by estimating the models from data that are grouped by covariate values. You can achieve this economy by using PROC SUMMARY to create a grouped data set and then using the grouped data syntax in PROC LOGISTIC, PROC PROBIT, or PROC GENMOD. Alternatively, PROC CATMOD will automatically group the data when you specify a logit model, and you can save the grouped data set for further analysis.

  • Third, if you are estimating logit models, you may want to sample on the dependent variable, at least for exploratory analysis. Typically, the data sets created for the methods in this chapter have a dichotomous dependent variable with an extreme split—the number of nonevents will be many times larger than the number of events. What you can do is take all the observations with events and a random subsample of the observations without events, so that the two groups are approximately equal. Is this legitimate? Well, it is for the logit model (but not for the complementary log-log model). It is now fairly well known that random sampling on the dependent variable in a logit analysis does not bias coefficient estimates (Prentice and Pike 1979).

  • Fourth, you may want to consider abandoning the maximum likelihood approach and using partial likelihood with Efron’s approximation. If the time intervals are small, then the data will probably not have a large number of ties, and the Efron approximation should work well. Of course, you lose the ability to test hypotheses about the dependence of the hazard on time.

Unequal Intervals

To this point, we have assumed that the time intervals giving rise to tied survival times are all of equal length. It’s not uncommon, however, for some intervals to be longer than others, either by design or by accident. For example, a study of deaths following surgery may have weekly follow-ups soon after the surgery when the risk of death is highest and then switch to monthly follow-ups later on. Some national panel studies, like the Panel Survey of Income Dynamics, were conducted annually in most years, but not in all. As a result, some intervals are two years long instead of one. Even if the intervals are equal by design, it often happens that some individuals cannot be reached on some follow-ups.

Regardless of the reason, it should be obvious that, other things being equal, the probability of an event will increase with the length of the interval. And if time-dependent covariates are associated with interval length, the result can be biased coefficient estimates. The solution to this problem depends on the pattern of unequal intervals and the model being estimated.

There is one case in which no special treatment is needed. If you are estimating the models in equation (7.1) or equation (7.2), which place no restrictions on the effect of time, and if the data are structured so that every individual’s interval at time t is the same length as every other individual’s interval at time t, then the separate parameters that are estimated for every time interval automatically adjust for differences in interval length. This situation is not as common as you might think, however. Even when intervals at the same calendar time have the same length, intervals at the same event time will have different lengths whenever individuals have different origin points.

In all other cases, an ad-hoc solution will usually suffice: simply include the length of the interval as a covariate in the model. If there are only two distinct interval lengths, a single dummy variable will work. If there are a small number of distinct lengths, construct a set of dummy variables. If there are many different lengths, you will probably need to treat length as a continuous variable but include a squared term in the model to adjust for nonlinearity.

Empty Intervals

In some data sets, there are time intervals in which no individual experiences an event. For example, in the leader data that we analyzed in Chapter 6, “Competing Risks,” none of the 472 leaders lost power in the 22nd year of rule. Naturally, this is most likely to occur when the number of time intervals is large and the number of individuals is small. Whenever there are empty intervals, if you try to estimate a model with an unrestricted effect of time, as in equation (7.1) or equation (7.2), the ML algorithm will not converge. This result is a consequence of the following general principle. For any dichotomous covariate, consider the 2 × 2 contingency table formed by that covariate and the dichotomous dependent variable. If any of the four cells in that table has a frequency of 0, the result is nonconvergence.

An easy solution is to fit a model with restrictions on the time effect. The quadratic function that was estimated in Output 7.4, for example, will not suffer from this problem. Alternatively, if you don’t want to lose the flexibility of the unrestricted model, you can constrain the coefficient for any empty interval to be the same as that of an adjacent interval. The simplest way to do this is to recode the variable containing the interval values so that the adjacent intervals have the same value. For the leader data, this requires a DATA step with the following statement:

if year=22 then year=21;

Then specify YEAR as a CLASS variable in PROC PROBIT or PROC GENMOD. Instead of separate dummy (indicator) variables for years 21 and 22, this code produces one dummy variable equal to 1 if an event time was equal to either of those two years; otherwise, the variable is equal to 0.

Left Truncation

In Chapter 5, we discussed a problem known as left truncation, in which individuals are not at risk of an event until some time after the origin time. This commonly occurs when “survivors” are recruited into a study at varying points in time. We saw how this problem can be easily corrected with the partial likelihood method using PROC PHREG. The solution is equally easy for the maximum likelihood methods discussed in this chapter, although quite different in form. In creating the multiple observations for each individuals, you simply delete any time units in which the individual is known not to be at risk of an event. For example, if patients are recruited into a study at various times since diagnosis, no observational units are created for time intervals that occurred prior to recruitment. We can still include time since diagnosis as a covariate, however.

This method also works for temporary withdrawals from the risk set. If your goal is to predict migration of people from Mexico to the U.S., you will probably want to remove anyone from the risk set when he or she was in prison, in the military, and so on. Again, you simply exclude any time units in which the individual is definitely not at risk.

Competing Risks

In Chapter 6, we saw that the likelihood function for data arising from multiple event types can be factored into a separate likelihood function for each event type, treating other event types as censoring. Strictly speaking, this result only holds when time is continuous and measured precisely, so that there are no ties. When time is discrete or when continuous-time data are grouped into intervals, the likelihood function does not factor. If you want full-information maximum likelihood estimates, you must estimate a model for all events simultaneously. On the other hand, it’s also possible to do separate analyses for each event type without biasing the parameter estimates and with only slight loss of precision.

These points are most readily explained for the logit model for competing risks. Let Pijt be the conditional probability that an event of type j occurs to person i at time t, given that no event occurs prior to time t. The natural extension of the logit model to multiple event types is the multinomial logit model:

or, equivalently

where Pi0t is the probability that no event occurs at time t to individual i. As in the case of a single event type, the likelihood for data arising from this model can be manipulated so that each discrete-time point for each individual appears as a separate observation. If there are, say, three event types, these individual time units can have a dependent variable coded 1, 2, or 3 if an event occurred and coded 4 if no event occurred. We can then use PROC CATMOD to estimate the model simultaneously for all event types.

For the job-duration data analyzed earlier in this chapter, suppose that there are actually two event types, quitting (EVENT=1) and being fired (EVENT=2). The following statements produce a person-year data set with a dependent variable called OUTCOME, which is coded 1 for quit, 2 for fired, and 3 for neither:

data jobyrs2;
   set jobdur;
   do year=l to dur;
      if year=dur and event=1 then outcome=1;
      else if year=dur and event=2 then outcome=2;
      else outcome=3;
      output;
   end;
run;

We can use PROC CATMOD to estimate the multinomial logit model:

proc catmod data=jobyrs2;
   direct ed prestige salary year;
   model outcome=ed prestige salary year / noprofile noiter;
run;

The DIRECT statement forces the four covariates to be treated as quantitative variables rather than categorical variables. NOPROFILE and NOITER merely suppress unneeded output.

Output 7.5 displays the results. The analysis-of-variance table gives statistics for testing the null hypothesis that both coefficients for each covariate are 0, a hypothesis that is resoundingly rejected for ED, PRESTIGE, and YEAR. The effect of SALARY is less dramatic, but still significant at the .03 level. The lower portion of the table gives the coefficient estimates and their respective test statistics. The odd-numbered parameters all pertain to the contrast between type 1 (quit) and no event, while the even-numbered parameters all pertain to the contrast between type 2 (fired) and no event. (PROC CATMOD treats no event as the reference category because it has the largest coded value, 3). We see that education increases the odds of quitting but reduces the odds of being fired, with both coefficients highly significant. Prestige of the job, on the other hand, reduces the risk of quitting while increasing the risk of being fired. Again, both coefficients are highly significant. Finally, the odds of quitting increase markedly with each year, but the odds of being fired hardly change at all.

Output 7.5. PROC CATMOD Results for Competing Risks Analysis of Job Data
      MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE

     Source                   DF   Chi-Square      Prob
     --------------------------------------------------
     INTERCEPT                 2         1.03    0.5965
     ED                        2        21.07    0.0000
     PRESTIGE                  2        66.56    0.0000
     SALARY                    2         6.75    0.0342
     YEAR                      2        19.56    0.0001

     LIKELIHOOD RATIO        534       269.58    1.0000



          ANALYSIS OF MAXIMUM-LIKELIHOOD ESTIMATES

                                     Standard    Chi-
Effect          Parameter  Estimate    Error    Square   Prob
--------------------------------------------------------------
INTERCEPT               1    0.3286    0.8196     0.16  0.6885
                        2   -1.5491    1.6921     0.84  0.3599
ED                      3    0.1921    0.0836     5.28  0.0216
                        4   -0.7941    0.2011    15.59  0.0001
PRESTIGE                5   -0.1128    0.0169    44.47  0.0000
                        6    0.1311    0.0274    22.82  0.0000
SALARY                  7   -0.0255    0.0103     6.17  0.0130
                        8    0.0110    0.0150     0.53  0.4652
YEAR                    9    0.7725    0.1748    19.54  0.0000
                       10  -0.00962    0.2976     0.00  0.9742

Now, we’ll redo the analysis with separate runs for each event type. For this, we rely on a well-known result from multinomial logit analysis (Begg and Gray 1984). To estimate a model for event type 1, simply eliminate from the sample all the person-years in which events of type 2 occurred. Then, do a binomial logit analysis for type 1 versus no event. To estimate a model for event type 2, eliminate all the person-years in which events of type 1 occurred. Then, do a binomial logit analysis for type 2 versus no event. Here’s the SAS code that accomplishes these tasks:

proc logistic data=jobyrs2;
   where outcome ne 2;
   model outcome=ed prestige salary year;
run;

proc logistic data=jobyrs2;
   where outcome ne 1;
   model outcome=ed prestige salary year;
run;

This procedure is justified as a form of conditional maximum likelihood. The resulting estimates are consistent and asymptotically normal, but there is some loss of precision, at least in principle. In practice, both the coefficients and their estimated standard errors usually differ only trivially from those produced by the simultaneous estimation procedure. We can see this by comparing the results in Output 7.6 with those in Output 7.5. The advantages of separating the estimation process are that you can

  • focus only on those event types in which you are interested

  • specify quite different models for different event types, with different covariates and different functional forms.

Output 7.6. PROC LOGISTIC Results for Competing-Risks Analysis of job Data
Analysis of Maximum Likelihood Estimates

            Parameter Standard    Wald       Pr >    Standardized    Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate     Ratio

INTERCPT 1     0.3672   0.8254     0.1979     0.6564            .    .
ED       1     0.1895   0.0840     5.0911     0.0240     0.292760   1.209
PRESTIGE 1    -0.1125   0.0169    44.3193     0.0001    -1.283707   0.894
SALARY   1    -0.0255   0.0103     6.1688     0.0130    -0.331824   0.975
YEAR     1     0.7637   0.1742    19.2200     0.0001     0.541032   2.146

                 Analysis of Maximum Likelihood Estimates

            Parameter Standard    Wald       Pr >    Standardized    Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate     Ratio

INTERCPT 1    -1.5462   1.7055     0.8220     0.3646            .    .
ED       1    -0.7872   0.1999    15.5021     0.0001    -1.182506   0.455
PRESTIGE 1     0.1300   0.0274    22.5031     0.0001     1.391817   1.139
SALARY   1     0.0118   0.0155     0.5844     0.4446     0.157079   1.012
YEAR     1    -0.0296   0.2913     0.0103     0.9191    -0.021124   0.971

Now, what about competing risks for the complementary log-log model? Here, things are a little messier. It is possible to derive a multinomial model based on a continuous-time proportional hazards model, and this could be simultaneously estimated for all event types using maximum likelihood. This is not a standard problem, however, and no SAS procedure will do it without a major programming effort. Instead, we can use the same strategy for getting separate estimates for each event type that we just saw in the case of the logit model. That is, we delete all individual time units in which events other than the one of interest occurred. Then we estimate a dichotomous complementary log-log model for the event of interest versus no event. In effect, we are deliberately censoring the data at the beginning of any intervals in which other events occur. This should not be problematic because, even if we have continuous-time data, we have to assume that the different event types are noninformative for one another. Once we eliminate all extraneous events from the data, we reduce the problem to one for a single event type. Again, there will be some slight loss of information in doing this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.35.148