Testing for Effects of Covariates

Besides tests of differences between groups, PROC LIFETEST can test whether quantitative covariates are associated with survival time. Given a list of covariates, PROC LIFETEST produces a test statistic for each one and ignores the others. It also treats them as a set, testing the null hypothesis that they are jointly unrelated to survival time and testing for certain incremental effects of adding variables to the set. The statistics are generalizations of the log-rank and Wilcoxon tests discussed earlier in this chapter. They can also be interpreted as nonparametric tests of the coefficients of the accelerated failure time model discussed in Chapter 4, “Estimating Parametric Regression Models with PROC LIFEREG.”

You can test the same sorts of hypotheses with PROC LIFEREG or PROC PHREG. In fact, the log-rank chi-square reported by PROC LIFETEST is identical to the score statistic given by PROC PHREG for the null hypothesis that all coefficients are 0 (when the data contain no tied event times). However, in most cases, you are better off switching to the regression procedures, for two reasons. First, PROC LIFETEST doesn’t give coefficient estimates, so there is no way to quantify the effect of a covariate on survival time. Second, the incremental tests do not really test the effect of each variable controlling for all the others. Instead, you get a test of the effect of each variable controlling for those variables that have already been included. Because you have no control over the order of inclusion, these tests can be misleading. Nevertheless, PROC LIFETEST can be useful for screening a large number of covariates before proceeding to estimate regression models. Because the log-rank and Wilcoxon tests do not require iterative calculations, they require relatively little computer time. (This is also true for the SELECTION=SCORE option in PROC PHREG).

Let’s look at the recidivism data as an example. The covariate tests are invoked by listing the variable names in a TEST statement:

proc lifetest data=recid;
   time week*arrest(0);
   test fin age race wexp mar paro prio;
run;

Output 3.13 shows selections from the output. I have omitted the Wilcoxon statistics because they are nearly identical to the log-rank statistics for this example. I also omitted the variance-covariance matrix for the statistics because it is primarily useful as input to other analyses.

Output 3.13. Covariate Tests for the Recidivism Data
         Univariate Chi-Squares for the LOG RANK Test

                Test                                    Pr >
 Variable    Statistic    Variance    Chi-Square     Chi-Square

 FIN           10.4256     28.4744       3.8172        0.0507
 AGE             233.2      4305.3      12.6318        0.0004
 RACE          -2.7093     12.8100       0.5730        0.4491
 WEXP          16.4141     27.3305       9.8580        0.0017
 MAR            7.1773     13.1535       3.9164        0.0478
 PARO           2.9471     26.7927       0.3242        0.5691
 PRIO           -108.8       812.3      14.5602        0.0001

  Forward Stepwise Sequence of Chi-Squares for the LOG RANK Test

                                Pr >      Chi-Square      Pr >
Variable    DF  Chi-Square   Chi-Square    Increment    Increment

PRIO         1    14.5602      0.0001       14.5602      0.0001
AGE          2    25.4905      0.0001       10.9303      0.0009
FIN          3    28.8871      0.0001        3.3966      0.0653
MAR          4    31.0920      0.0001        2.2050      0.1376
RACE         5    32.4214      0.0001        1.3294      0.2489
WEXP         6    33.2800      0.0001        0.8585      0.3541
PARO         7    33.3828      0.0001        0.1029      0.7484

The top panel shows that age at release (AGE), work experience (WEXP), and number of prior convictions (PRIO) have highly significant associations with time to arrest. The effects of marital status (MAR) and financial aid (FIN) are more marginal, while race and parole status (PARO) are apparently unrelated to survival time. The signs of the log-rank test statistics tell you the direction of the relationship. The negative sign for PRIO indicates that inmates with more prior convictions tend to have shorter times to arrest. On the other hand, the positive coefficient for AGE indicates that older inmates have longer times to arrest. As already noted, none of these tests controls or adjusts for any of the other covariates.

The lower panel displays results from a forward inclusion procedure. PROC LIFETEST first finds the variable with the highest chi-square statistic in the top panel—in this case PRIO—and puts it in the set to be tested. Since PRIO is the only variable in the set, the results for PRIO are the same in both panels. Then PROC LIFETEST finds the variable that produces the largest increment in the joint chi-square for the set of two variables—in this case AGE. The joint chi-square of 25.49 in line 2 tests the null hypothesis that the coefficients of AGE and PRIO in an accelerated-failure time model are both 0. The chi-square increment of 10.93 is merely the difference between the joint chi-square in lines 1 and 2. It is a test of the null hypothesis that the coefficient for AGE is 0 when PRIO is controlled. On the other hand, there is no test for the effect of PRIO controlling for AGE.

This process is repeated until all the variables are added. For each variable, we get a test of the hypothesis that the variable has no effect on survival time controlling for all the variables above it (but none of the variables below it). For variables near the end of the sequence, the incremental chi-square values are likely to be similar to what you might find with PROC LIFEREG or PROC PHREG. For variables near the beginning of the sequence, however, the results can be quite different.

For this example, the forward inclusion procedure leads to some substantially different conclusions from the univariate procedure. While WEXP has a highly significant effect on survival time when considered by itself, there is no evidence of such an effect when other variables are controlled. The reason is that work experience is moderately correlated with age and the number of prior convictions, both of which have substantial effects on survival time. Marital status also loses its statistical significance in the forward inclusion test.

What is the relationship between the STRATA statement and the TEST statement? For a dichotomous variable like FIN, the statement TEST FIN is a possible alternative to STRATA FIN. Both produce a test of the null hypothesis that the survivor functions are the same for the two categories of FIN. In fact, if there are no ties in the data (no cases with exactly same event time), the two statements will produce identical chi-square statistics and p-values. In the presence of ties, however, STRATA and TEST use somewhat different formulas, which may result in slight differences in the p-values. (If you’re interested in the details, see Collett 1994, p. 284). In the recidivism data, for example, the 114 arrests occurred at only 49 unique arrest times, so the number of ties was substantial. The STRATA statement produces a log-rank chi-square of 3.8376 for a p-value of .0501, and a Wilcoxon chi-square of 3.7495 for a p-value of .0528. The TEST statement produces a log-rank chi-square of 3.8172 for a p-value of .0507 and a Wilcoxon chi-square of 3.7485 for a p-value of .0529. Obviously the differences are minuscule in this case.

Other considerations should govern the choice between STRATA and TEST. While STRATA produces separate tables and graphs of the survivor function for the two groups, TEST produces only the single table and graph for the entire sample. With TEST, you can test for the effects of many dichotomous variables with a single statement, but STRATA requires a new PROC LIFETEST step for each variable tested. Of course, if a variable has more than two values, STRATA treats each value as a separate group while TEST treats the variable as a quantitative measure.

What happens when you include both a STRATA statement and a TEST statement? Adding a TEST statement has no effect whatever on the results from the STRATA statement. This fact implies that the hypothesis test produced by the STRATA statement in no way controls for the variables listed in the TEST statement. On the other hand, the TEST statement can produce quite different results, depending on whether you also have a STRATA statement. When you have a STRATA statement, the log-rank and Wilcoxon statistics produced by the TEST statement are first calculated within strata and then averaged across strata. In other words, they are stratified statistics that control for whatever variable or variables are listed in the STRATA statement. Suppose, for example, that for the myelomatosis data we want to test the effect of the treatment while controlling for renal functioning. We can submit these statements:

proc lifetest data=renal;
   time dur*censor(0);
   strata renal;
   test treat;
run;

The resulting log-rank chi-square for TREAT was 5.791 with a p-value of .016. This result is in sharp contrast with the unstratified chi-square of only 1.3126 that we saw earlier in this chapter (Output 3.5). As we’ll see in Chapter 5, “Estimating Cox Regression Models with PROC PHREG” (Output 5.17), you can obtain identical results using PROC PHREG with stratification and the score test.

Clearly, there is no point in listing a variable in both a STRATA and a TEST statement. If you do it anyway, the TEST statement will not give meaningful results for that variable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.157.186