Estimating Survivor Functions

As we have seen, the form of the dependence of the hazard on time is left unspecified in the proportional hazards model. Furthermore, the partial likelihood method discards that portion of the likelihood function that contains information about the dependence of the hazard on time. Nevertheless, it is still possible to get nonparametric estimates of the survivor function based on a fitted proportional hazards model.

When there are no time-dependent covariates, the Cox model can be written as

S(t) = [S0(t)]exp(βx)

where S(t) is the survival probability at time t for an individual with covariate values x, and S0(t) is the baseline survivor function, that is, the survivor function for an individual whose covariate values are all 0. After estimating β by partial likelihood, you can get an estimate of S0(t) by a nonparametric maximum likelihood method. With that estimate in hand, you can generate the estimated survivor function for any set of covariate values by substitution in the equation above.

In PROC PHREG, you accomplish this with the BASELINE statement. The easiest task is to get the survivor function for x = , the vector of sample means. For the recidivism data set, the SAS code for accomplishing this task is as follows:

proc phreg data=recid;
   model week*arrest(0)=fin age prio
         / ties=efron;
   baseline out=a survival=s logsurv=ls loglogs=1ls;
run;

proc print data=a;
run;

(Only statistically significant covariates are included in this run so that the width of an output record does not exceed the printed page.) The OUT=A option requests that SAS put the survival estimates in a temporary data set named A. SURVIVAL=S asks that the survival probabilities be stored in a variable named S. You can interpret these probabilities as estimates of the survivor function, controlling for the effects of the covariates. LOGSURV requests the negative logarithm of the survival probabilities, also known as the cumulative hazard function. LOGLOGS requests log[–logS(t)]. These logarithmic transformations of the survivor functions were discussed in more detail in Chapters 2, 3, and 4.

Data set A is printed in Output 5.19. There are fifty records, corresponding to the 49 unique weeks in which arrests were observed to occur, plus an initial record for time 0. Each record gives the mean value for each of the three covariates. The S column gives the estimated survival probabilities. The last two columns are the two logarithmic transforms.

Output 5.19. Survivor Function Estimates for Recidivism Data at Sample Means
OBS    FIN       AGE       PRIO       WEEK       S          LS        LLS

 1    0.5     24.5972   2.98380        0     1.00000     0.00000    .
 2    0.5     24.5972   2.98380        1     0.99801    -0.00200  -6.21670
 3    0.5     24.5972   2.98380        2     0.99601    -0.00399  -5.52280
 4    0.5     24.5972   2.98380        3     0.99402    -0.00600  -5.11671
 5    0.5     24.5972   2.98380        4     0.99203    -0.00800  -4.82813
 6    0.5     24.5972   2.98380        5     0.99004    -0.01001  -4.60378
 7    0.5     24.5972   2.98380        6     0.98804    -0.01203  -4.41998
 8    0.5     24.5972   2.98380        7     0.98604    -0.01406  -4.26428
 9    0.5     24.5972   2.98380        8     0.97601    -0.02428  -3.71816
10    0.5     24.5972   2.98380        9     0.97200    -0.02840  -3.56145
11    0.5     24.5972   2.98380       10     0.96999    -0.03047  -3.49104
12    0.5     24.5972   2.98380       11     0.96593    -0.03467  -3.36193
13    0.5     24.5972   2.98380       12     0.96184    -0.03891  -3.24646
14    0.5     24.5972   2.98380       13     0.95979    -0.04104  -3.19320
15    0.5     24.5972   2.98380       14     0.95363    -0.04748  -3.04744
16    0.5     24.5972   2.98380       15     0.94951    -0.05181  -2.96010
17    0.5     24.5972   2.98380       16     0.94538    -0.05616  -2.87948
18    0.5     24.5972   2.98380       17     0.93918    -0.06275  -2.76859
19    0.5     24.5972   2.98380       18     0.93293    -0.06942  -2.66754
20    0.5     24.5972   2.98380       19     0.92876    -0.07391  -2.60492
21    0.5     24.5972   2.98380       20     0.91831    -0.08522  -2.46257
22    0.5     24.5972   2.98380       21     0.91414    -0.08977  -2.41050
23    0.5     24.5972   2.98380       22     0.91205    -0.09206  -2.38534
24    0.5     24.5972   2.98380       23     0.90996    -0.09435  -2.36070
25    0.5     24.5972   2.98380       24     0.90160    -0.10358  -2.26738
26    0.5     24.5972   2.98380       25     0.89528    -0.11061  -2.20170
27    0.5     24.5972   2.98380       26     0.88891    -0.11775  -2.13915
28    0.5     24.5972   2.98380       27     0.88467    -0.12254  -2.09928
29    0.5     24.5972   2.98380       28     0.88041    -0.12737  -2.06068
30    0.5     24.5972   2.98380       30     0.87614    -0.13223  -2.02324
31    0.5     24.5972   2.98380       31     0.87400    -0.13467  -2.00493
32    0.5     24.5972   2.98380       32     0.86972    -0.13958  -1.96909
33    0.5     24.5972   2.98380       33     0.86542    -0.14455  -1.93416
34    0.5     24.5972   2.98380       34     0.86109    -0.14956  -1.90008
35    0.5     24.5972   2.98380       35     0.85242    -0.15968  -1.83458
36    0.5     24.5972   2.98380       36     0.84590    -0.16736  -1.78763
37    0.5     24.5972   2.98380       37     0.83719    -0.17771  -1.72761
38    0.5     24.5972   2.98380       38     0.83500    -0.18032  -1.71303
39    0.5     24.5972   2.98380       39     0.83063    -0.18557  -1.68434
40    0.5     24.5972   2.98380       40     0.82186    -0.19618  -1.62870
41    0.5     24.5972   2.98380       42     0.81746    -0.20155  -1.60173
42    0.5     24.5972   2.98380       43     0.80863    -0.21241  -1.54922
43    0.5     24.5972   2.98380       44     0.80418    -0.21793  -1.52360
44    0.5     24.5972   2.98380       45     0.79972    -0.22349  -1.49840
45    0.5     24.5972   2.98380       46     0.79078    -0.23473  -1.44932
46    0.5     24.5972   2.98380       47     0.78855    -0.23756  -1.43733
47    0.5     24.5972   2.98380       48     0.78406    -0.24326  -1.41361
48    0.5     24.5972   2.98380       49     0.77284    -0.25769  -1.35601
49    0.5     24.5972   2.98380       50     0.76607    -0.26648  -1.32247
50    0.5     24.5972   2.98380       52     0.75703    -0.27836  -1.27885

Of what use are these estimated functions? If you have hypotheses about the shape of the hazard function, the estimates can provide some helpful evidence, as we saw in Chapter 3. In particular, a constant hazard function implies a cumulative hazard function that increases as a straight line. If a graph of the log-survivor function curves upward, it is evidence for an increasing hazard. On the other hand, if the log-survivor function bends below a straight line, it suggests that the hazard is decreasing with time. For this purpose, it really doesn’t matter at what covariate values the survivor function is calculated.

A graph of the negative log-survivor (cumulative hazard) function, shown in Output 5.20, is produced by the following statements:

data b;
   set a;
   ls=-ls;
run;

proc gplot data=b;
   symbol1 value=none interpol=join;
   plot ls*week;
run;

The curve appears to bend slightly upward, suggesting a hazard that increases with time. That’s consistent with what we found in Chapters 3 and 4.

Output 5.20. Graph of the Cumulative Hazard Function for Recidivism Data


By combining the BASELINE statement with stratification, we can also produce graphs that are helpful in evaluating the proportional hazards assumption. Suppose we take financial aid (FIN) as the stratifying variable for the recidivism data. That might seem self-defeating since FIN is the variable of greatest interest and stratifying on it means that no tests or estimates of its effect are produced. But after stratifying, we can graph the baseline survivor function for the two financial aid groups using the following code:

proc phreg data=recid;
   model week*arrest(0)=age prio / ties=efron;
   strata fin;
   baseline out=a loglogs=lls survival=s;
run;

proc gplot data=a;
   plot lls*week=fin;
   symbol1 interpol=join color=black line=1;
   symbol2 interpol=join color=black line=2;
run;

The resulting graph in Output 5.21 shows the log-log survivor functions for each of the two financial aid groups, evaluated at the means of the covariates. If the hazards are proportional, the log-log survivor functions should be parallel. Here’s why. If two hazard functions, h1(t) and h2(t), are proportional, we can write

h1(t) = γh2(t)

where γ is the constant of proportionality. Substituting this into equation (2.6), it’s easily shown that

S1(t) = [S2(t)]γ.

Taking the logarithm, multiplying by –1, and taking the logarithm a second time yields

log[–log S1(t)] = log γ + log[–log S2(t)]

which says that the two log-log survival curves differ by a constant amount, log γ. Examining Output 5.21, we see that the two curves are approximately the same shape, but also farther apart in some regions than others.

Output 5.21. Log-Log Survivor Plots for the Two Financial Aid Groups


The differences are more dramatic if we compare the smoothed hazard functions produced with the SMOOTH macro that was introduced in Chapter 3 and that is described in detail in Appendix 1, “Macro Programs.” After producing baseline data set A, which contains the survivor function estimates, the macro is invoked by submitting the following statement, which gives the graph in Output 5.22.

%smooth(data=a,time=week,survival=s)

Here we see evidence that the hazard of arrest is almost identical during the earlier weeks, but it rapidly diverges after week 15 or thereabouts, reaching a maximum difference around week 25. (Group 2 received aid, Group 1 did not). This evidence suggests that it takes awhile for the financial aid to have its desired effect, but that the effect eventually wears off after the aid is terminated.

Output 5.22. Smoothed Hazard Functions for Two Financial Aid Groups


This graph suggests a different specification for the interaction between financial aid and time that we investigated earlier (see Interactions with Time as Time-Dependent Covariates). Specifically, let’s construct a dummy variable that is coded as 1 when time is between 20 and 30 weeks and is coded as 0 elsewhere. Then, we include the product of that variable and FIN in the Cox regression model:

proc phreg data=recid;
   model week*arrest(0)=fin finmid age prio / ties=efron;
   mid=(20<week<30);
   finmid=fin*mid;
run;

Results in Output 5.23 show that the interaction is significant at the .03 level. During this middle period, the arrest rate for those who did not receive financial aid is more than five times larger than the rate for those who did receive aid. The p-value is perhaps an underestimate, however, since the unusual specification is dependent on the graphical analysis rather than some a priori hypothesis.

Output 5.23. Cox Regression with the Nonproportional Effect of Financial Aid
Analysis of Maximum Likelihood Estimates

                  Parameter    Standard      Wald        Pr >         Risk
 Variable  DF      Estimate      Error    Chi-Square  Chi-Square     Ratio

 FIN        1     -0.158078     0.20504      0.59435      0.4407     0.854
 FINMID     1     -1.455933     0.66475      4.79696      0.0285     0.233
 AGE        1     -0.066964     0.02084     10.32821      0.0013     0.935
 PRIO       1      0.096731     0.02727     12.58657      0.0004     1.102

Another major use of the baseline survivor function is to obtain predictions about survival time for particular sets of covariate values. These covariate values need not be ones that appear in the data set being analyzed. For the recidivism data, for example, we may want to say something about arrest times for 40-year-olds with three prior convictions who did not receive financial aid. The mechanics of doing this are a bit awkward. You must create a new data set containing the values of the covariates for which you want predictions and then pass the name of that data set to PROC PHREG:

data covals;
   input fin age prio;
   cards;
0 40 3
run;
proc phreg data=recid;
   model week*arrest(0)=fin age prio /
         ties=efron;
   baseline out=a covariates=covals survival=s lower=lcl
         upper=ucl / nomean;
run;

proc print data=a;
run;

The advantage of doing it this way is that predictions can easily be generated for many different sets of covariate values just by including more input lines in the data set COVALS. Each input line produces a complete set of survivor estimates, but all estimates are output to a single data set. The NOMEAN option suppresses the output of survivor estimates evaluated at the mean values of the covariates, which are otherwise included by default. The LOWER= and UPPER= options (available in Release 6.10 and later) give 95-percent confidence intervals around the survival probability.

Output 5.24 displays a portion of the data set generated by the BASELINE statement above. In generating predictions, it’s typical to focus on a single summary measure rather than the entire distribution. The median survival time is easily obtained by finding the smallest value of t such that S(t) ≤ .50. That won’t work for the recidivism data, however, because the data are censored long before a .50 probability is reached. For these data, it’s probably more useful to pick a fixed point in time and calculate survival probabilities at that time under varying conditions. For the covariate values in Output 5.24, the six-month (26 week) survival probability is .95, with a 95 percent confidence interval of .92 to .99.

Output 5.24. Portion of Survivor Function Estimate for Recidivism Data
OBS    FIN    AGE    PRIO    WEEK       S          LCL         UCL

 19     0      40      3      18     0.97101     0.94823     0.99434
 20     0      40      3      19     0.96916     0.94510     0.99384
 21     0      40      3      20     0.96453     0.93728     0.99258
 22     0      40      3      21     0.96267     0.93415     0.99207
 23     0      40      3      22     0.96174     0.93257     0.99182
 24     0      40      3      23     0.96080     0.93100     0.99156
 25     0      40      3      24     0.95705     0.92471     0.99053
 26     0      40      3      25     0.95421     0.91996     0.98973
 27     0      40      3      26     0.95132     0.91514     0.98894
 28     0      40      3      27     0.94939     0.91192     0.98841
 29     0      40      3      28     0.94746     0.90869     0.98788
 30     0      40      3      30     0.94551     0.90544     0.98734
 31     0      40      3      31     0.94453     0.90382     0.98707
 32     0      40      3      32     0.94256     0.90056     0.98653
 33     0      40      3      33     0.94058     0.89728     0.98598
 34     0      40      3      34     0.93859     0.89399     0.98542
 35     0      40      3      35     0.93457     0.88737     0.98428
 36     0      40      3      36     0.93154     0.88239     0.98342

Release 6.10 and later lets you choose between two alternative methods (labeled PL for product limit and CH for cumulative hazard) for calculating the survivor function and its transformations, but there are no strong reasons for preferring one or the other. PL is the default. These two methods produce identical results (apart from rounding error) when there is only one censoring time for all cases, as with the recidivism data. Note, finally, that the BASELINE statement will not produce any output when there are time-dependent covariates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.108.236