Time-Dependent Covariates

Time-dependent covariates are those that may change in value over the course of observation. While it’s simple to modify Cox’s model to allow for time-dependent covariates, the computation of the resulting partial likelihood is much more time consuming, and the practical issues surrounding the implementation of the procedure can be quite complex. It’s easy to make mistakes without realizing it, so be sure you know what you’re doing.

To modify the model in equation (5.2) to include time-dependent covariates, all we need to do is write (t) after the xs that are time dependent. For a model with one fixed covariate and one time-dependent covariate, we have

log hi(t) = α(t) + β1xi1 + β2xi2(t).

This says that the hazard at time t depends on the value of x1, and on the value of x2 at time t. What may not be clear is that x2(t) can be defined using any information about the individual prior to time t, thereby allowing for lagged or cumulative values of some variables. For example, if we want a model in which the hazard of arrest depends on employment status, we can specify employment as

  • whether the person is currently employed

  • whether the person was employed in the previous month

  • the number of weeks of employment in the preceding three months

  • the number of bouts of unemployment in the preceding 12 months.

The use of lagged covariates is often essential for resolving issues of causal ordering (more on that later).

Heart Transplant Example

Constructing appropriate time-dependent covariates frequently requires complex manipulation of the available data. PROC PHREG is particularly good for this kind of work because it provides you with a rich subset of DATA step operators and functions for defining time-dependent covariates. To get some idea of how this works, let’s take another look at the Stanford Heart Transplant Data. In Partial Likelihood: Examples, we attempted to determine whether a transplant raised or lowered the risk of death by examining the effect of a time-constant covariate TRANS that was equal to 1 if the patient ever had a transplant, and was equal to 0 otherwise. I claimed that those results were completely misleading because patients who died quickly were less likely to get transplants. Now we’ll do it right by defining a time-dependent covariate PLANT equal to 1 if the patient has already had a transplant at day t; otherwise, PLANT is equal to 0. Here’s how it’s done:

proc phreg data=stan;
   model survl*dead(0)=plant surg ageaccpt / ties=exact;
   if wait>survl or wait=. then plant=0; else plant=l;
run;

Recall that SURV1 is the time in days from acceptance into the program until death or termination of observation; SURG =1 if the patient had previous heart surgery; otherwise SURG=0; AGEACCPT is the patient’s age in years at the time of acceptance; and WAIT is the time in days from acceptance until transplant surgery, coded as missing for those who did not receive transplants.

Notice that the new covariate PLANT is listed in the MODEL statement before it is defined in the IF statement that follows. At first glance, this IF statement may be puzzling. For patients who were not transplanted, the IF condition will always be true because their WAIT value will be missing. On the other hand, for those who received transplants, it appears that the IF condition will always be false because their waiting time to transplant must be less than their survival time, giving us a fixed covariate rather than a time-dependent covariate. Now it’s true that waiting time is always less than survival time for transplanted patients (except for one patient who died on the operating table). But unlike an IF statement in the DATA step, which only operates on a single case at a time, this IF statement compares waiting times for patients who were at risk of a death with survival times for patients who experienced events. Thus, the SURV1 in this statement is not usually the patient’s own survival time, but the survival time of some other patient who died. This fact will become clearer (hopefully) in the next subsection when we examine the construction of the partial likelihood.

Results in Output 5.11 indicate that transplantation has no effect on the hazard of death. The effect of age at acceptance is somewhat smaller than it was in Output 5.2, although still statistically significant. However, the effect of prior heart surgery is larger and now significant at the .05 level. Estimation of this model took about twice as much computer time as a comparable model with the fixed version of the transplant variable.

Output 5.11. Results for Transplant Data with a Time-Dependent Covariate
                Parameter   Standard    Wald       Pr >         Risk
Variable DF      Estimate     Error  Chi-Square Chi-Square     Ratio

PLANT     1     -0.046152    0.30276    0.02324     0.8788     0.955
SURG      1     -0.771454    0.35961    4.60216     0.0319     0.462
AGEACCPT  1      0.031088    0.01391    4.99524     0.0254     1.032

Construction of the Partial Likelihood with Time-Dependent Covariates

With time-dependent covariates, the partial likelihood function has the same form we saw previously in equation (5.4) and equation (5.10). The only thing that changes is that the covariates are now indexed by time. Consider, for example, the 12 selected cases from the Stanford Heart Transplant data shown in Output 5.12. These are all the cases that had death or censoring times between 16 and 38 days, inclusive.

Output 5.12. Selected Cases from the Stanford Heart Transplant Data
OBS    SURV1    DEAD    WAIT

19      16       1       4
20      17       1       .
21      20       1       .
22      20       1       .
23      27       1      17
24      29       1       4
25      30       0       .
26      31       1       .
27      34       1       .
28      35       1       .
29      36       1       .
30      38       1      35

On day 16, one death occurred (to case 19). Since 18 people had already died or been censored by day 16, there were 103–18 = 85 people left in the risk set on that day. Let’s suppose that we have a single covariate, the time-dependent version of transplant status. The partial likelihood for day 16 is therefore


To calculate this quantity, PROC PHREG must compute the value of x on day 16 for each of the 85 people at risk. For cases 19 and 24, a transplant occurred on day 4. Since this was before day 16, we have x19(16) = 1 and x24(16) = 1. Waiting time is missing for cases 20-22 and cases 25-29, indicating that they never received a transplant. Therefore, x20(16) = x21(16) = x22(16) = x25(16) = x26(16) = x27(16) = x28(16) = x29(16) = 0. Case 23 had a transplant on day 17, so on day 16, the patient was still without a transplant and x23(16)= 0. Similarly, we have x30(16) = 0 because case 30 didn’t get a transplant until day 35.

The calculation of the appropriate x values is accomplished by the IF statement discussed earlier in the heart transplant example. At each unique event time, PROC PHREG calculates a term in the partial likelihood function (like the one above) by applying the IF statement to all the cases in the risk set at that time. Again, the value of SURVl in the IF statement is the event time that PROC PHREG is currently operating on, not the survival time for each individual at risk.

To continue the example, the next term in the partial likelihood function corresponds to the death that occurred to case 20 on day 17:


Of course case 19 no longer shows up in this formula because the patient left the risk set at death. The values of x are all the same as they were for day 16, except for case 23, who had a transplant on day 17. For this case, waiting time is not greater than 17, so x23(17) = 1. For the data in Output 5.12, there are eight additional terms in the partial likelihood function. For the first five of these, the values of x for cases remaining in the risk set are the same as they were on day 17. On day 35, however, the value of x for case 30 switches from 0 to 1.

The IF statement is evaluated every time a given case appears in a risk set for a particular event time. Thus, those cases with long event (or censoring) times will appear in many different risk sets. For the 103 cases in this data set, there were 63 unique event times. If we sum the size of the risk sets for those 63 times, we get a total of 3,548, which is the number of times that the IF statement must be evaluated. That’s just on one iteration. Most partial likelihood programs reconstruct the time-dependent values at each iteration. Since this example requires four iterations, the total number of IF statement evaluations is 14,192. Now you see why time-dependent covariates are more computationally intensive.

PROC PHREG reduces the computational burden by saving the calculated time-dependent values in a temporary data set after the first iteration and re-using them on subsequent iterations. The saved data set can potentially be much larger than the original data set, however, so you may run into trouble if you’re short on disk space. To override this feature, specify the MULTIPASS option in the PROC PHREG statement.

Covariates Representing Alternative Time Origins

When I discussed the choice of time origin in Chapter 2, I mentioned that you can include alternative origins as covariates, sometimes as time-dependent covariates. Let’s see how this might work with the Stanford Heart Transplant Data. In the analysis just completed, the origin was the date of acceptance into the program, with the time of death or censoring computed from that point. It is certainly plausible, however, that the hazard of death also depends on age or calendar time. We have already included age at acceptance into the program as a time-constant covariate and found that patients who were older at acceptance had a higher hazard of death. But if that’s the case, we might also expect that the hazard will continue to increase with age after acceptance into the program. A natural way to allow for this possibility is to specify a model with current age as a time-dependent covariate. Here’s how to do that with PROC PHREG:

proc phreg data=stan;
   model surv1*dead(0)=plant surg age / ties=exact;
   if wait>survl or wait=. then plant=0; else plant=1;
   age=ageaccpt+surv1;
run;

In this program, current age is defined as age at acceptance plus the time to the current event. While this is certainly correct, a surprising thing happens: the results are exactly the same as in Output 5.11. Here’s why. We can write the time-dependent version of the model as

Equation 5.13


where x1 is the surgery indicator, x2 is transplant status, and x3 is current age. We also know that x3(t) = x3(0) + t, where x3(0) is age at the time of acceptance. Substituting into equation (5.13), we have

log h(t) = α*(t) + β1x1 + β2x2(t) + β3x3(0)

where α*(t) = α(t) + β3t. Thus, we have converted a model with a time-dependent version of age to one with a fixed version of age. In the process, the arbitrary function of time changes, but that’s of no consequence because it drops out of the estimating equations anyway. The same trick works with calendar time: instead of specifying a model in which the hazard depends on current calendar time, we can estimate a model with calendar time at the point of acceptance and get exactly the same results.

The trick does not work, however, if the model says that the log of the hazard is a nonlinear function of the alternative time origin. For example, suppose we want to estimate the model

log h(t) = α(t) + β1x1 + β2x2(t) + β3 log x3(t)

where x3(t) is again age at time t. Substitution with x3(t) = x3(0) + t gets us nowhere in this case because the β3 coefficient does not distribute across the two components of x3(t). You must estimate this model with log x3(t) as a time-dependent covariate:

proc phreg data=stan;
   model survl*dead(0)=plant surg logage / ties=exact;
   if wait>survl or wait=. then plant=0; else plant=l;
   logage=log(ageaccpt+survl);
run;

In sum, if you are willing to forego nonlinear functions of time, you can include any alternative time origin as a fixed covariate, measured at the origin that is actually used in calculating event times.

Time-Dependent Covariates Measured at Regular Intervals

As we saw earlier, calculation of the partial likelihood requires that the values of the covariates be known for every individual who was at risk at each event time. In practice, because we never know in advance when events will occur, we need to know the values of all the time-dependent covariates at every point in time. This requirement was met for the heart transplant data: death times were measured in days, and for each day, we could construct a variable indicating whether a given patient had already had a heart transplant.

Often, however, the information on the covariates is only collected at regular intervals of time that may be longer (or shorter) than the time units used to measure event times. For example, in a study of time to death among AIDS patients, there may be monthly follow-ups in which vital signs and blood measurements are taken. If deaths are reported in days, there is only about a one-in-thirty chance that the time-dependent measurements will be available for a given patient on a particular death day. In such cases, it is necessary to use some ad-hoc method for assigning covariate values to death days. In a moment, I will discuss several issues related to such ad-hoc approaches. First, let’s look at an example in which the time intervals for covariate measurement correspond exactly to the intervals in which event times are measured.

For the recidivism example, additional information was available on the employment status of the released convicts over the one-year follow-up period. Specifically, for each of the 52 weeks of follow-up, there was a dummy variable coded 1 if the person was employed full-time during that week; otherwise the variable was coded 0. The data are read as follows:

data recid;
   infile 'c:
ecid.dat';
   input week arrest fin age race wexp mar paro prio emp1-emp52;
run;

The important point here is that the 52 values of employment status (EMP1-EMP52) are read in as separate variables on a single input record.

The PROC PHREG statements are

proc phreg data=recid;
   model week*arrest(0)=fin age race wexp mar paro prio employed
         / ties=efron;
   array emp(*) emp1-emp52;
   do i=1 to 52;
      if week=i then employed=emp[i];
   end;
run;

The aim here is to pick out the employment indicator that corresponds to the particular week in which an event occurred and assign that value to the variable EMPLOYED. The ARRAY statement makes it possible to treat the 52 distinct dummy variables as a single subscripted array, thereby greatly facilitating the subsequent manipulations.

The only problem with this code is that the program has to cycle through 52 IF statements to pick out the right value of the employment variable. A more efficient (but somewhat less intuitive) program that directly retrieves the right value is as follows:

proc phreg data=recid;
   model week*arrest(0)=fin age race wexp mar paro prio employed
         / ties=efron;
   array emp(*) emp1-emp52;
   employed=emp[week];
run;

This program takes about 23 percent less time to run than the DO-IF version. Output 5.13 shows the results (for either version). For the time-constant variables, the coefficients and test statistics are pretty much the same as in Output 5.8. Judging by the chi-square test, however, the new variable EMPLOYED has by far the strongest effect of any variable in the model. The risk ratio of .265 tells us that the risk of arrest for those who were employed full time is a little more than one-fourth the risk for those who were not employed full time.

Output 5.13. Recidivism Results with a Time-Dependent Covariate
               Parameter   Standard    Wald       Pr >         Risk
Variable DF     Estimate     Error  Chi-Square Chi-Square     Ratio

FIN       1    -0.356722    0.19113    3.48351     0.0620     0.700
AGE       1    -0.046342    0.02174    4.54532     0.0330     0.955
RACE      1     0.338658    0.30960    1.19651     0.2740     1.403
WEXP      1    -0.025553    0.21142    0.01461     0.9038     0.975
MAR       1    -0.293747    0.38303    0.58814     0.4431     0.745
PARO      1    -0.064206    0.19468    0.10876     0.7416     0.938
PRIO      1     0.085139    0.02896    8.64391     0.0033     1.089
EMPLOYED  1    -1.328321    0.25072   28.07006     0.0001     0.265

Unfortunately, these results are undermined by the possibility that arrests affect employment status rather than vice versa. If someone is arrested and incarcerated near the beginning of a particular week, the probability of working full time during the remainder of that week is likely to drop precipitously. This potential reverse causation is a problem that is quite common with time-dependent covariates, especially when event times or covariate times are not measured precisely.

One way to reduce ambiguity in the causal ordering is to lag the covariate values. Instead of predicting arrests in a given week by employment status in the same week, we can use employment status in the prior week. This requires only minor modifications in the SAS code:

proc phreg;
   where week>1;
   model week*arrest(0)=fin age race wexp mar paro prio employed
         / ties=efron;
   array emp(*) emp1-emp52;
   employed=emp[week-1];
run;

One change is to add a WHERE statement to eliminate cases (only one case, in fact) with an arrest in the first week after release. This change is necessary because there were no values of employment status prior to the first week. The other change is to subscript EMP with WEEK–1 rather than with WEEK. With these changes, the coefficient for EMPLOYED drops substantially, from –1.33 to –.79, which implies that the risk of arrest for those who were employed is about 45 percent of the risk of those who were not employed. While this is a much weaker effect than we found using unlagged values of employment status, it is still highly significant with a chi-square value of 13.1. The effects of the other variables remain virtually unchanged.

As this example points out, there are often many different ways of specifying the effect of a time-dependent covariate. Let’s consider a couple of the alternatives. Instead of a single lagged version of the employment status indicator, we can have both a one-week and a two-week lag, as shown in the following:

proc phreg;
   where week>2;
   model week*arrest(0)=fin age race wexp mar paro prio employ1
         employ2 / ties=efron;
   array emp(*) emp1-emp52;
   employ1=emp[week-1];
   employ2=emp[week-2];
run;

Note that because of the two-week lag, it is necessary to eliminate cases with events in either week 1 or week 2. When I tried this variation, I found that neither EMPLOY1 nor EMPLOY2 was significant (probably because they are highly correlated), but that the one-week lag was much stronger than the two-week lag. So it looks as though we’re better off sticking with the single one-week lag.

Another possibility is that the hazard of arrest may depend on the cumulative employment experience after release rather than the employment status in the preceding week. Consider the following SAS code:

data recidcum;
   set recid;
   array emp(*) emp1-emp52;
   array cum(*) cum1-cum52;
   cum1=emp1;
   do i=2 to 52;
      cum(i)=cum(i-1) + emp(i);
   end;
   do i=l to 52;;
      cum(i)=cum(i)/ i ;
   end;
run;
proc phreg data=recidcum;
   where week>l;
   model week*arrest(0)=fin age race wexp mar paro prio employ
         / ties=efron;
   array cumemp(*) cum1-cum52;
   employ=cumemp[week-1];
run;

The DATA step defines a new set of variables CUM1-CUM52 that are the cumulative proportions of weeks worked for each of the 52 weeks. The first DO loop creates the cumulative count; the second DO loop changes the counts to proportions. The PROC PHREG statements have the same structure as before, except that the cumulative employment (lagged by one week) has been substituted for the lagged employment indicator. This run produces a marginally significant effect of cumulative employment experience. When I also included the one-week-lagged employment indicator, the effect of the cumulated variable faded to insignificance, while the lagged indicator continued to be a significant predictor. Again, it appears that the one-week lag is a better specification.

You can create the cumulated variable in the PROC PHREG step rather than the DATA step, but it would be unwise to do so. When I tried it that way, the execution time for the DATA step dropped from 17 seconds to 8 seconds, but the PROC step time went from 1.82 minutes to 8.72 minutes. In general, whatever programming can be done in the DATA step should be done there because the computations only have to be done once. In the PROC PHREG step, on the other hand, the same computations may have to be repeated many times.

Ad-Hoc Estimates of Time Dependent Covariates

It often happens that time-dependent covariates are measured at regular intervals, but the intervals don’t correspond to the units in which event times are measured. For example, we may know the exact day of death for a sample of cancer patients, but have only monthly measurements of, say, albumin level in the blood. For partial likelihood estimation, we really need daily albumin measurements, so we must somehow impute these from the monthly data. There are often several possible ways to do this, and, unfortunately, none has any formal justification. On the other hand, it’s undoubtedly better to use some common-sense method for imputing the missing values rather than discarding the data for the time-dependent covariates.

Let’s consider some possible methods and some rough rules of thumb. For the case of monthly albumin measurements and day of death, an obvious method is to use the closest preceding albumin level to impute the level at any given death time. For data over a one-year period, the SAS code might look like this:

data blood;
   infile 'blood.dat';
   input deathday status alb1-alb12;
run;

proc phreg;
   model deathday*status(0)=albumin;
   array alb(*) alb1-alb12;
   deathmon=ceil(deathday/30.4);
   albumin=alb[deathmon];
run;

Assume that ALB1 is measured at the beginning of the first month, and so on. Dividing DEATHDAY by 30.4 converts days into months (including fractions of a month). The CEIL function then takes the smallest integer larger than its argument. Thus, day 40 would be converted to 1.32, which then becomes 2, that is, the second month. This value is then used as a subscript in the ALB array to retrieve the albumin level recorded at the beginning of the second month. (There may be some slippage here because months vary in length. However, if we know the exact day at which each albumin measurement was taken, we can avoid this difficulty by using the methods for irregular intervals described in the next section.)

It may be possible to get better imputations of daily albumin levels by using information on the blood levels in earlier months. If we believe, for example, that albumin levels are likely to worsen (or improve) steadily, it might be sensible to calculate a linear extrapolation based on the most recent two months:

proc phreg data=blood;
   model deathday*status(0)=albumin;
   array alb(*) alb1-alb12;
   deathmon=deathday/30.4;
   j=ceil(deathmon);
   if j=1 then albumin=alb(1);
   else albumin=alb[j]+(alb[j]-alb[j-1]) * (deathmon-j+1);
run;

Alternatively, if we believe that albumin levels tend to fluctuate randomly around some average value, we might do better with a weighted average of the most recent value and the mean of all earlier values. Unlike the linear extrapolation, most of the work for this specification can be done in the DATA step:

data data=blood;
   set;
   array alb(*) alb1-alb12;
   array meanalb(*) mean1-mean12;
   array predalb(*) pred1-pred12;
   pred1=alb1;
   mean1=alb1;
   do i=2 to 12;
      meanalb(i)=(1/i)*alb(i)+(1-1/i)*meanalb(i-1);
      predalb(i)=.7*alb(i)+.3*meanalb(i-1));
   end;
run;

proc phreg;
   model deathday*status(0)=albumin;
   array pred(*) pred1-pred12;
   deathmon=ceil(deathday/30.4);
   albumin=pred[deathmon];
run;

This code gives the most recent value an arbitrarily chosen weight of .7 and the mean of earlier values a weight of .3. Instead of choosing the weights arbitrarily, they can be estimated by including both the most recent albumin level and the mean of the earlier level as covariates in the model.

In these examples, we made no use of information on albumin levels that were recorded after the death date. Obviously, we had no other option for patients who died on that death date. Remember, however, that for every death date, PROC PHREG retrieves (or constructs) the covariates for all individuals who were at risk of death on that date, whether or not they died. For those who did not die on that death date, we could have used the (possibly weighted) average of the albumin level recorded before the death date and the level recorded after the death date. I don’t recommend this, however. Using different imputation rules for those who died and those who didn’t die is just asking for artifacts to creep into your results. Even in cases where the occurrence of the event does not stop the measurement of the time-dependent covariate, it’s a dangerous practice to use information recorded after the event to construct a variable used as a predictor of the event. This method is sensible only if you are completely confident that the event could not have caused any changes in the time-dependent covariate. For example, in modeling whether people will purchase a house, it might be reasonable to use an average of the local mortgage rates before and after the purchase.

As an alternative to the methods discussed in this subsection, you may want to consider using PROC EXPAND, a procedure in SAS/ETS software. PROC EXPAND reads a time series data set, converts it into a continuous-time function using a variety of interpolation methods, and outputs a new data set with observations at any desired time intervals. It can also impute missing values in a time series. One difficulty is that PROC EXPAND treats data for each time point as a distinct observation, while PROC PHREG expects all of the data for each individual to be contained in a single observation. Hence, the data set produced by PROC EXPAND requires some additional DATA step manipulations before you can use it with PROC PHREG.

Time-Dependent Covariates that Change at Irregular Intervals

In the Stanford Heart Transplant example, we had a time-dependent covariate—previous receipt of a transplant—that changed at unpredictable times. No more than one such change could occur for any of the patients. Now we consider the more general situation in which the time-dependent covariate may change at multiple, irregularly spaced points in time.

We’ll do this by way of an example. The survival data (hypothetical) in Output 5.14 are for 29 males, ages 50 to 60, who were diagnosed with alcoholic cirrhosis. At diagnosis, they were measured for blood coagulation time (PT). The men were then remeasured at clinic visits that occurred at irregular intervals until they either died or the study was terminated. The maximum number of clinic visits for any patient was 10. The length of the intervals between visits ranged between 3 and 33 months, with a mean interval length of 9.5 months and a standard deviation of 6.0. In Output 5.14, SURV is the time of death or time of censoring, calculated in months since diagnosis. DEAD is coded 1 for a death and is coded 0 if censored. TIME2-TIME10 contain the number of months since diagnosis for each clinic visit. PT1-PT10 contain the measured values of the PT variable at each clinic visit. Variables are recorded as missing if there was no clinic visit.

Output 5.14. Survival Data for 29 Males with Alcoholic Cirrhosis
                              T
      T  T  T  T  T  T  T  T  I
 S D  I  I  I  I  I  I  I  I  M                                                 P
 U E  M  M  M  M  M  M  M  M  E    P    P    P    P    P    P    P    P    P    T
 R A  E  E  E  E  E  E  E  E  1    T    T    T    T    T    T    T    T    T    1
 V D  2  3  4  5  6  7  8  9  0    1    2    3    4    5    6    7    8    9    0

90 0  7 20 26 35 44 50 56 80 83 23.9 20.8 23.6 23.6 24.0 22.5 24.6 25.1 29.4 27.9
80 0  6 36 42 54 67 78  .  .  . 29.6 15.1 15.4 16.3 13.9 14.6 16.1   .    .    .
36 0 17 28 34  .  .  .  .  .  . 25.9 24.4 24.8 24.3   .    .    .    .    .    .
68 0 15 20 26 32 51  .  .  .  . 26.8 27.9 26.5 26.5 26.8 26.6   .    .    .    .
62 0 22 40 46  .  .  .  .  .  . 23.0 25.2 27.1 27.8   .    .    .    .    .    .
47 0  5 12 24 35 46  .  .  .  . 25.8 26.0 25.2 24.9 26.3 26.6   .    .    .    .
84 0  8 27 31 43 76  .  .  .  . 14.2 11.5 12.9 12.6 12.5 18.6   .    .    .    .
57 0  6 21 27 34 39 45 51  .  . 27.6 27.5 28.0 27.8 29.1 28.2 28.3 28.4   .    .
 7 0  4  7  .  .  .  .  .  .  . 25.0 25.1 24.7   .    .    .    .    .    .    .
49 0 16  .  .  .  .  .  .  .  . 25.5 27.4   .    .    .    .    .    .    .    .
55 0  3  9 15 21 33 42 49  .  . 14.8 16.7 16.9 17.7 13.8 13.8 13.7 14.2   .    .
43 0  6 11 18 24 42  .  .  .  . 20.6 19.9 20.3 20.2 19.7 27.1   .    .    .    .
42 0  6 12 18 23 29 35  .  .  . 27.6 27.0 28.1 28.8 29.0 28.4 28.8   .    .    .
11 0  9  .  .  .  .  .  .  .  . 25.3 27.8   .    .    .    .    .    .    .    .
36 1 16 22 28  .  .  .  .  .  . 22.5 22.3 25.2 26.4   .    .    .    .    .    .
36 1  9 26 32  .  .  .  .  .  . 26.9 26.9 24.2 26.2   .    .    .    .    .    .
 2 1  .  .  .  .  .  .  .  .  . 19.2   .    .    .    .    .    .    .    .    .
23 1  7 13  .  .  .  .  .  .  . 21.8 20.3 23.8   .    .    .    .    .    .    .
10 1  6  .  .  .  .  .  .  .  . 21.6 22.3   .    .    .    .    .    .    .    .
29 1 21 27  .  .  .  .  .  .  . 18.7 20.2 22.5   .    .    .    .    .    .    .
16 1  6 12  .  .  .  .  .  .  . 28.4 28.7 28.7   .    .    .    .    .    .    .
15 1  7 12  .  .  .  .  .  .  . 17.8 17.7 17.4   .    .    .    .    .    .    .
 5 1  3  .  .  .  .  .  .  .  . 20.7 22.6   .    .    .    .    .    .    .    .
15 1  6  .  .  .  .  .  .  .  . 28.0 28.8   .    .    .    .    .    .    .    .
 1 1  .  .  .  .  .  .  .  .  . 31.6   .    .    .    .    .    .    .    .    .
13 1  4 10  .  .  .  .  .  .  . 26.0 22.7 25.4   .    .    .    .    .    .    .
39 1 22 35  .  .  .  .  .  .  . 25.5 29.0 29.2   .    .    .    .    .    .    .
20 1 12  .  .  .  .  .  .  .  . 21.3 21.0   .    .    .    .    .    .    .    .
45 1 18 24 38  .  .  .  .  .  . 23.9 28.7 29.5 30.2   .    .    .    .    .    .

Let’s estimate a model in which the hazard of death at time t depends on the value of PT at time t. Since we don’t have measures of PT at all death times, we’ll use the closest preceding measurement. The SAS code for accomplishing this is as follows:

proc phreg data=alco;
   model surv*dead(0)=pt;
   time1=0;
   array time(*) time1-time10;
   array p(*) pt1-pt10;
   do j=1 to 10;
   if surv ge time[j] and time[j] ne . then pt=p[j];
   end;
run;

For a given death time, the DO loop cycles through all 10 possible clinic visits. If the death time is greater than the time of jth visit, the value of PT is reassigned to be the value observed at visit j. PROC PHREG keeps doing this until it either

  • encounters a missing value of the TIME variable (no clinic visit)

  • encounters a TIME value that is greater than the death time

  • goes through all 10 possible visits.

Hence, PROC PHREG always stops at the most recent clinic visit and assigns the value of PT recorded at that visit.

When I ran this model, I got a coefficient for PT of .083 with a nonsignificant likelihood-ratio chi-square value of 1.87, suggesting that the blood measurement is of little predictive value. But suppose the hazard depends on the change in PT relative to the patient’s initial measurement rather than upon the absolute level. You can implement this idea by changing the IF statement above to read as follows:

if surv ge time[j] and time[j] ne . then pt=p[j]-pt1;

With this minor change in specification, the coefficient of PT increased in magnitude to .352 with a chi-square of 6.97, which is significant at beyond the .01 level.

In general, the only additional requirement for handling irregular intervals between measurements of some covariate is data on the timing of the measurement. The variables containing the times of the measurements must always be in the same metric as the event-time variable. A DO loop like the one above can then retrieve the appropriate measurement for each event time.

Covariates that are Undefined in Some Intervals

In the heart transplant example, we estimated the effect of a heart transplant on the hazard of death. Earlier in the chapter (Output 5.3), after eliminating the patients who did not receive a transplant, we also examined the effects of several covariates that were defined only for those who received transplants: date of transplant, age at transplant, waiting time until transplant, and three measures of tissue matching. We could combine these two analyses into a single model if we could figure out what to do with the covariates that are undefined for patients who have not yet received transplants. One way to do it is to treat the transplant-specific variables as time-dependent covariates that have values of 0 before a transplant occurred. The SAS code to do that is as follows:

proc phreg data=stan;
   model surv1*dead(0)=surg ageaccpt plant m1td m2td m3td
         waittd dottd / ties=exact;
   if wait>surv1 or wait=. then plant=0; else plant=1;
   if plant=1 then do;
      m1td=m1;
      m2td=m2;
      m3td=m3;
      waittd=wait;
      dottd=dot;
   end;
   else do;
      m1td=0;
      m2td=0;
      m3td=0;
      waittd=0;
      dottd=0;
   end;
run;

This program defines the time-dependent transplant indicator PLANT as in the heart transplant example. Then whenever PLANT=1 (i.e., the patient has received a transplant), the time-dependent versions of five variables are defined equal to their values measured at the time of the transplant. When PLANT=0, these variables are set to 0.

The results in Output 5.15 are similar to those in the earlier, separate analyses. There is a significant effect of age at acceptance, although the magnitude and significance level are both attenuated from what we saw in Output 5.2. We also see a coefficient for the M3 measure that approaches statistical significance. Notice, however, that in contrast to Output 5.2, age at transplant is not included in the model. That’s because age at transplant is equal to age at acceptance plus waiting time for those who received a transplant. (This does not prevent estimation of a model with all three variables because not everyone received a transplant. However, a model with three variables can produce estimates that are not interpretable.)

You can interpret the coefficient for PLANT as the effect of a transplant when all the other time-dependent covariates are equal to 0. This is not substantively meaningful, however, because a date of transplant of 0 is not possible with these data. Instead, the model should be interpreted in the following way: the effect of a transplant depends on the levels of the transplant-specific variables. More specifically, the effect of a transplant for any given individual is a linear function of those five variables, with coefficients given in Output 5.15. The coefficient of PLANT is the intercept in that linear function.

Output 5.15. Heart Transplant Results with Several Time-Dependent Covariates
               Parameter   Standard    Wald       Pr >         Risk
Variable DF     Estimate     Error  Chi-Square Chi-Square     Ratio

SURG      1    -0.663841    0.38647    2.95051     0.0859     0.515
AGEACCPT  1     0.029589    0.01445    4.19147     0.0406     1.030
PLANT     1     0.788753    1.20651    0.42738     0.5133     2.201
M1TD      1    -0.226452    0.19339    1.37110     0.2416     0.797
M2TD      1     0.164605    0.44867    0.13460     0.7137     1.179
M3TD      1     0.600854    0.33729    3.17337     0.0748     1.824
WAITTD    1     0.004256    0.00530    0.64515     0.4219     1.004
DOTTD     1    -0.000298  0.0002993    0.98940     0.3199     1.000

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.27.202