In earlier releases of PROC PHREG, the OUTPUT statement could produce a data set containing eight different diagnostic statistics for each individual. An additional six statistics are available in Release 6.10. While there isn’t space to discuss all of them, I’ll concentrate on those that I think are most useful and interpretable. As with the BASELINE statement, OUTPUT will not produce a data set if there are any time-dependent covariates.
All releases of PROC PHREG have options for three different residual statistics that are computed for each individual in the sample: Cox-Snell residuals (LOGSURV), martingale residuals (RESMART), and deviance residuals (RESDEV). (For a detailed discussion of these residuals, see Collett (1994, p. 150)). Martingale residuals are obtained by transforming Cox-Snell residuals, and deviance residuals are a further transformation of martingale residuals. For most purposes, you can ignore the Cox-Snell and martingale residuals. While Cox-Snell residuals were useful for assessing the fit of the parametric models in Chapter 4, they are not very informative for Cox models estimated by partial likelihood.
Deviance residuals behave much like residuals from OLS regression: they are symmetrically distributed around 0 and have an approximate standard deviation of 1.0. They are negative for observations that have longer survival times than expected and positive for observations with survival times that are smaller than expected. Deviance residuals can also be used like residuals from OLS regression. Very high or very low values suggest that the observation may be an outlier in need of special attention. You can plot the residuals against the covariates, and any unusual patterns may suggest features of the data that have not been adequately fitted by the model. Be aware, however, that censoring can produce striking patterns that don’t necessarily imply any problem with the model.
Here’s an example in which deviance residuals for the recidivism data are plotted against AGE:
proc phreg data=recid; model week*arrest(0)=fin age prio / ties=efron; output out=c resdev=dev; run; proc gplot data=c; symbol1 value=dot h=.2; plot dev*age; run;
These statements produce the graph in Output 5.25.
Clearly, there is a disjunction between two groups of observations. The elongated cluster of points in the lower portion of the graph are all the censored observations, while the more widely dispersed points in the upper portion of the graph are the uncensored observations. Note also the rise in the censored observations with increasing age. That’s because age is associated with longer times to arrest, yet all censored observations are censored at the same point in time. With increasing age, survival to one year becomes more consistent with the model’s predictions. Some of the residuals exceed 3, which is large enough to warrant concern.
Release 6.10 and later can also produce Schoenfeld residuals (RESSCH), weighted Schoenfeld residuals (WTRESSCH), and score residuals (RESSCO). All three share a rather unusual property: instead of a single residual for each individual, there is a separate residual for each covariate for each individual. They also sum to 0 (approximately) in the sample. A major difference among them is that the score residuals are defined for all observations, while the Schoenfeld residuals (both weighted and unweighted) are not defined for censored observations (they are missing in the output data set).
My experience in examining graphs of these three residuals is that they all behave much the same. Since the Schoenfeld residuals are better known and easier to explain, I’ll concentrate on them. Here’s how they work. Suppose that individual i dies at time ti, and at that time there were 30 people at risk of death, indexed by j = 1,...,30. For each of those 30 people, the estimated Cox model implies a certain probability of dying at that time, denoted by pj. Imagine randomly selecting one of these 30 people, with probability pj. For each covariate xk, we can calculate its expected value for a randomly selected person (from that risk set) as
The Schoenfeld residual is then defined as the covariate value for the person who actually died, xik, minus the expected value.
The main function of these residuals is to detect possible departures from the proportional hazards assumption. Since Schoenfeld residuals are, in principle, independent of time, a plot that shows a relationship with time is evidence against that assumption. For the recidivism data, I produced a data set containing the Schoenfeld residuals using these statements:
proc phreg data=recid; model week*arrest(0)=fin age prio / ties=efron; output out=b ressch=schfin schage schprio; run; proc print data=b; run;
Note that the key word RESSCH is followed by three arbitrarily chosen variable names corresponding to the three covariates in the model. These variables contain the Schoenfeld residuals in the output data set. Output 5.26 displays the 18 observations with the lowest arrest times in the data set.
WEEK ARREST FIN SCHFIN AGE SCHAGE PRIO SCHPRIO 12 1 1 0.58848 27 4.3964 0 -3.9624 11 1 0 -0.40653 19 -3.5396 18 13.8255 11 1 1 0.59347 19 -3.5396 2 -2.1745 10 1 0 -0.40284 21 -1.5256 14 9.7362 9 1 1 0.59574 26 3.4613 0 -4.2572 9 1 1 0.59574 30 7.4613 3 -1.2572 8 1 1 0.59318 21 -1.5440 4 -0.2909 8 1 0 -0.40682 28 5.4560 4 -0.2909 8 1 1 0.59318 20 -2.5440 11 6.7091 8 1 0 -0.40682 23 0.4560 5 0.7091 8 1 1 0.59318 40 17.4560 1 -3.2909 7 1 1 0.59193 20 -2.5387 2 -2.2861 6 1 0 -0.40616 19 -3.5221 6 1.7059 5 1 0 -0.40475 19 -3.5098 3 -1.2896 4 1 0 -0.40351 18 -4.4960 1 -3.2795 3 1 0 -0.40284 30 7.4915 3 -1.2774 2 1 0 -0.40260 44 21.4789 2 -2.2760 1 1 0 -0.40163 20 -2.5150 0 -4.2657 |
Consider the person who was arrested in week 12. He was 27 years old which, according to the Schoenfeld residual, was about 4.4 years older than the model predicts. He had no prior convictions, but the model predicts that a person arrested at that time should have about 4 prior convictions (SCHPRIO=–3.96). He also received financial aid, although the model predicts a probability of only .41 that a person arrested in week 12 would be receiving aid (SCHFIN=.59). These are not especially large numbers compared with others in Output 5.26. The person arrested in week 2, for example, was 21 years older than predicted.
The next step is to plot the residuals for each covariate against time:
proc gplot data=b; plot schfin*week schprio*week schage*week; symboll value=dot h=.02; run;
Output 5.27 shows the graphs produced by this code.
The graph for the FIN residuals is not very informative, which is typical of graphs for dichotomous covariates. For PRIO, the residuals have a fairly random scatter. For AGE, there appears to be a slight tendency for the residuals to decline with time since release. To get a less subjective assessment (but still ad hoc), I did an OLS regression of the residuals for each variable on WEEK. The p-values for FIN and PRIO were .93 and .53, respectively, while the p-value for AGE was .02, suggesting that there may be some departure from proportionality for that variable.
Most OLS regression packages nowadays can compute various influence statistics that measure how much the results would change if a particular observation is removed from the analysis. Such statistics can also be computed for Cox regression models (Collett 1994, p. 169), and several are now available with PROC PHREG in Release 6.10 and later. The likelihood displacement (LD) statistic measures influence on the model as whole. This statistic tells you (approximately) how much the log-likelihood (multiplied by 2) will change if the individual is removed from the sample. The DFBETA statistics tell you how much each coefficient will change by removal of a single observation. These are also approximations. (There is also an LMAX statistic that measures overall influence, but its interpretation is rather esoteric. LMAX is preferable to LD for plots against covariates, however.)
For the 25 cases in the myelomatosis data set that was analyzed in Chapter 3, I used the following PROC PHREG statements to fit a model and produce the influence statistics:
proc phreg data=myel; model dur*status(0)=treat renal; id id; output out=c ld=ldmyel dfbeta=dtreat drenal; run;
The estimated coefficients were 4.11 for RENAL and 1.24 for TREAT. The OUTPUT statement creates a new data set C with the variables LDMYEL, DTREAT and DRENAL, as well as with all the variables in the MODEL statement. The ID statement is necessary to put the ID variable in the new data set. Data set C is displayed in Output 5.28.
ID DUR STATUS RENAL TREAT DFRENAL DFTREAT LDMYEL 1 8 1 1 1 -0.08058 -0.21768 0.13393 2 180 1 0 2 0.11234 0.04908 0.01232 3 632 1 0 2 0.08879 0.00971 0.00587 4 852 0 0 1 0.07526 0.08582 0.02097 5 52 1 1 1 0.18409 0.08899 0.03600 6 2240 0 0 2 0.01587 -0.11223 0.04067 7 220 1 0 1 -0.09357 -0.19650 0.10797 8 63 1 1 1 0.04170 0.09916 0.02762 9 195 1 0 2 0.11180 0.04819 0.01207 10 76 1 0 2 0.11273 0.04974 0.01250 11 70 1 0 2 0.11303 0.05023 0.01263 12 8 1 0 1 -1.44585 -0.46469 1.71233 13 13 1 1 2 0.02970 0.01809 0.00120 14 1990 0 0 2 0.01587 -0.11223 0.04067 15 1976 0 0 1 0.08603 0.10383 0.03048 16 18 1 1 2 0.01759 0.00851 0.00033 17 700 1 0 2 0.08717 0.00699 0.00575 18 1296 0 0 1 0.08603 0.10383 0.03048 19 1460 0 0 1 0.08603 0.10383 0.030484 20 210 1 0 2 0.11106 0.04694 0.011744 21 63 1 1 1 0.04170 0.09916 0.027625 22 1328 0 0 1 0.08603 0.10383 0.030484 23 1296 1 0 2 0.07554 -0.01244 0.006041 24 365 0 0 1 0.05943 0.05934 0.010271 25 23 1 1 2 -0.01780 -0.01950 0.001089 |
The signs of the DFBETA statistics are the reverse of what you might expect—a negative sign means that the coefficient increases when the observation is removed. Most of the observations have rather small values for the influence statistics, but observation 12 has exceptionally large values for all three. In particular, the value –1.45 for DRENAL indicates that if observation 12 is removed, the RENAL coefficient will increase to approximately 4.11+1.45 = 5.56, an increase of 35 percent. Why is this observation so influential? It turns out that if you actually reestimate the model without this observation, the algorithm does not converge. The only indication of this, unfortunately, is that even though the coefficient of RENAL is large (over 19), the estimated standard error is gigantic (1471). Why no convergence? If you look closely at Output 5.28, you’ll see that all the durations when RENAL=1 (renal impairment) are smaller than all the durations when RENAL=0 (no impairment) with one exception. You guessed it, observation 12 had no impairment, but it did have an early death. In general, convergence does not occur when there is no overlap in the duration times for the two values of a dichotomous covariate. The coefficient for the covariate gets larger in magnitude with every iteration of the Newton-Raphson algorithm. In this example, observation 12 is the crucial observation that prevents this from happening.
What should be done here? When some observations are found to be unusually influential, a common strategy is to make sure that the data for those observations is correct. Often, such observations are found to contain recording or coding errors. (Of course, there’s a danger in selectively targeting some observations for error checking.) When no errors are detected, the researcher needs to be up front about the possible sensitivity of the model to one or to a small number of observations. For this example, the removal of observation 12 makes a strong effect even stronger (so strong that the algorithm does not converge), so there was no danger of drawing a misleading qualitative conclusion. The other question to ask is whether the model needs to be modified or elaborated to take the peculiar features of the influential observations into account. Perhaps there are other covariates that ought to be included, or perhaps there is some nonlinearity that is inadequately modeled.
18.221.53.209