Residuals and Influence Statistics

In earlier releases of PROC PHREG, the OUTPUT statement could produce a data set containing eight different diagnostic statistics for each individual. An additional six statistics are available in Release 6.10. While there isn’t space to discuss all of them, I’ll concentrate on those that I think are most useful and interpretable. As with the BASELINE statement, OUTPUT will not produce a data set if there are any time-dependent covariates.

Individual Residuals

All releases of PROC PHREG have options for three different residual statistics that are computed for each individual in the sample: Cox-Snell residuals (LOGSURV), martingale residuals (RESMART), and deviance residuals (RESDEV). (For a detailed discussion of these residuals, see Collett (1994, p. 150)). Martingale residuals are obtained by transforming Cox-Snell residuals, and deviance residuals are a further transformation of martingale residuals. For most purposes, you can ignore the Cox-Snell and martingale residuals. While Cox-Snell residuals were useful for assessing the fit of the parametric models in Chapter 4, they are not very informative for Cox models estimated by partial likelihood.

Deviance residuals behave much like residuals from OLS regression: they are symmetrically distributed around 0 and have an approximate standard deviation of 1.0. They are negative for observations that have longer survival times than expected and positive for observations with survival times that are smaller than expected. Deviance residuals can also be used like residuals from OLS regression. Very high or very low values suggest that the observation may be an outlier in need of special attention. You can plot the residuals against the covariates, and any unusual patterns may suggest features of the data that have not been adequately fitted by the model. Be aware, however, that censoring can produce striking patterns that don’t necessarily imply any problem with the model.

Here’s an example in which deviance residuals for the recidivism data are plotted against AGE:

proc phreg data=recid;
   model week*arrest(0)=fin age prio / ties=efron;
   output out=c resdev=dev;
run;

proc gplot data=c;
   symbol1 value=dot h=.2;
    plot dev*age;
run;

These statements produce the graph in Output 5.25.

Clearly, there is a disjunction between two groups of observations. The elongated cluster of points in the lower portion of the graph are all the censored observations, while the more widely dispersed points in the upper portion of the graph are the uncensored observations. Note also the rise in the censored observations with increasing age. That’s because age is associated with longer times to arrest, yet all censored observations are censored at the same point in time. With increasing age, survival to one year becomes more consistent with the model’s predictions. Some of the residuals exceed 3, which is large enough to warrant concern.

Output 5.25. Graph of Deviance Residuals by Age for Recidivism Data


Covariate-Wise Residuals

Release 6.10 and later can also produce Schoenfeld residuals (RESSCH), weighted Schoenfeld residuals (WTRESSCH), and score residuals (RESSCO). All three share a rather unusual property: instead of a single residual for each individual, there is a separate residual for each covariate for each individual. They also sum to 0 (approximately) in the sample. A major difference among them is that the score residuals are defined for all observations, while the Schoenfeld residuals (both weighted and unweighted) are not defined for censored observations (they are missing in the output data set).

My experience in examining graphs of these three residuals is that they all behave much the same. Since the Schoenfeld residuals are better known and easier to explain, I’ll concentrate on them. Here’s how they work. Suppose that individual i dies at time ti, and at that time there were 30 people at risk of death, indexed by j = 1,...,30. For each of those 30 people, the estimated Cox model implies a certain probability of dying at that time, denoted by pj. Imagine randomly selecting one of these 30 people, with probability pj. For each covariate xk, we can calculate its expected value for a randomly selected person (from that risk set) as


The Schoenfeld residual is then defined as the covariate value for the person who actually died, xik, minus the expected value.

The main function of these residuals is to detect possible departures from the proportional hazards assumption. Since Schoenfeld residuals are, in principle, independent of time, a plot that shows a relationship with time is evidence against that assumption. For the recidivism data, I produced a data set containing the Schoenfeld residuals using these statements:

proc phreg data=recid;
   model week*arrest(0)=fin age prio /
         ties=efron;
   output out=b ressch=schfin schage schprio;
run;

proc print data=b;
run;

Note that the key word RESSCH is followed by three arbitrarily chosen variable names corresponding to the three covariates in the model. These variables contain the Schoenfeld residuals in the output data set. Output 5.26 displays the 18 observations with the lowest arrest times in the data set.

Output 5.26. Portion of the Output Data Set with Schoenfeld Residuals
WEEK    ARREST    FIN     SCHFIN     AGE     SCHAGE    PRIO    SCHPRIO

 12        1       1      0.58848     27     4.3964      0     -3.9624
 11        1       0     -0.40653     19    -3.5396     18     13.8255
 11        1       1      0.59347     19    -3.5396      2     -2.1745
 10        1       0     -0.40284     21    -1.5256     14      9.7362
  9        1       1      0.59574     26     3.4613      0     -4.2572
  9        1       1      0.59574     30     7.4613      3     -1.2572
  8        1       1      0.59318     21    -1.5440      4     -0.2909
  8        1       0     -0.40682     28     5.4560      4     -0.2909
  8        1       1      0.59318     20    -2.5440     11      6.7091
  8        1       0     -0.40682     23     0.4560      5      0.7091
  8        1       1      0.59318     40    17.4560      1     -3.2909
  7        1       1      0.59193     20    -2.5387      2     -2.2861
  6        1       0     -0.40616     19    -3.5221      6      1.7059
  5        1       0     -0.40475     19    -3.5098      3     -1.2896
  4        1       0     -0.40351     18    -4.4960      1     -3.2795
  3        1       0     -0.40284     30     7.4915      3     -1.2774
  2        1       0     -0.40260     44    21.4789      2     -2.2760
  1        1       0     -0.40163     20    -2.5150      0     -4.2657

Consider the person who was arrested in week 12. He was 27 years old which, according to the Schoenfeld residual, was about 4.4 years older than the model predicts. He had no prior convictions, but the model predicts that a person arrested at that time should have about 4 prior convictions (SCHPRIO=–3.96). He also received financial aid, although the model predicts a probability of only .41 that a person arrested in week 12 would be receiving aid (SCHFIN=.59). These are not especially large numbers compared with others in Output 5.26. The person arrested in week 2, for example, was 21 years older than predicted.

The next step is to plot the residuals for each covariate against time:

proc gplot data=b;
   plot schfin*week schprio*week schage*week;
   symboll value=dot h=.02;
run;

Output 5.27 shows the graphs produced by this code.

Output 5.27. Graphs of Schoenfeld Residuals Versus Time, Recidivism Data


The graph for the FIN residuals is not very informative, which is typical of graphs for dichotomous covariates. For PRIO, the residuals have a fairly random scatter. For AGE, there appears to be a slight tendency for the residuals to decline with time since release. To get a less subjective assessment (but still ad hoc), I did an OLS regression of the residuals for each variable on WEEK. The p-values for FIN and PRIO were .93 and .53, respectively, while the p-value for AGE was .02, suggesting that there may be some departure from proportionality for that variable.

Influence Diagnostics

Most OLS regression packages nowadays can compute various influence statistics that measure how much the results would change if a particular observation is removed from the analysis. Such statistics can also be computed for Cox regression models (Collett 1994, p. 169), and several are now available with PROC PHREG in Release 6.10 and later. The likelihood displacement (LD) statistic measures influence on the model as whole. This statistic tells you (approximately) how much the log-likelihood (multiplied by 2) will change if the individual is removed from the sample. The DFBETA statistics tell you how much each coefficient will change by removal of a single observation. These are also approximations. (There is also an LMAX statistic that measures overall influence, but its interpretation is rather esoteric. LMAX is preferable to LD for plots against covariates, however.)

For the 25 cases in the myelomatosis data set that was analyzed in Chapter 3, I used the following PROC PHREG statements to fit a model and produce the influence statistics:

proc phreg data=myel;
   model dur*status(0)=treat renal;
   id id;
   output out=c ld=ldmyel dfbeta=dtreat drenal;
run;

The estimated coefficients were 4.11 for RENAL and 1.24 for TREAT. The OUTPUT statement creates a new data set C with the variables LDMYEL, DTREAT and DRENAL, as well as with all the variables in the MODEL statement. The ID statement is necessary to put the ID variable in the new data set. Data set C is displayed in Output 5.28.

Output 5.28. Influence Statistics for Myelomatosis Data
ID    DUR   STATUS   RENAL   TREAT   DFRENAL    DFTREAT    LDMYEL

  1      8      1       1       1    -0.08058   -0.21768   0.13393
  2    180      1       0       2     0.11234    0.04908   0.01232
  3    632      1       0       2     0.08879    0.00971   0.00587
  4    852      0       0       1     0.07526    0.08582   0.02097
  5     52      1       1       1     0.18409    0.08899   0.03600
  6   2240      0       0       2     0.01587   -0.11223   0.04067
  7    220      1       0       1    -0.09357   -0.19650   0.10797
  8     63      1       1       1     0.04170    0.09916   0.02762
  9    195      1       0       2     0.11180    0.04819   0.01207
 10     76      1       0       2     0.11273    0.04974   0.01250
 11     70      1       0       2     0.11303    0.05023   0.01263
 12      8      1       0       1    -1.44585   -0.46469   1.71233
 13     13      1       1       2     0.02970    0.01809   0.00120
 14   1990      0       0       2     0.01587   -0.11223   0.04067
 15   1976      0       0       1     0.08603    0.10383   0.03048
 16     18      1       1       2     0.01759    0.00851   0.00033
 17    700      1       0       2     0.08717    0.00699   0.00575
 18   1296      0       0       1     0.08603    0.10383   0.03048
 19   1460      0       0       1     0.08603    0.10383   0.030484
 20    210      1       0       2     0.11106    0.04694   0.011744
 21     63      1       1       1     0.04170    0.09916   0.027625
 22   1328      0       0       1     0.08603    0.10383   0.030484
 23   1296      1       0       2     0.07554   -0.01244   0.006041
 24    365      0       0       1     0.05943    0.05934   0.010271
 25     23      1       1       2    -0.01780   -0.01950   0.001089

The signs of the DFBETA statistics are the reverse of what you might expect—a negative sign means that the coefficient increases when the observation is removed. Most of the observations have rather small values for the influence statistics, but observation 12 has exceptionally large values for all three. In particular, the value –1.45 for DRENAL indicates that if observation 12 is removed, the RENAL coefficient will increase to approximately 4.11+1.45 = 5.56, an increase of 35 percent. Why is this observation so influential? It turns out that if you actually reestimate the model without this observation, the algorithm does not converge. The only indication of this, unfortunately, is that even though the coefficient of RENAL is large (over 19), the estimated standard error is gigantic (1471). Why no convergence? If you look closely at Output 5.28, you’ll see that all the durations when RENAL=1 (renal impairment) are smaller than all the durations when RENAL=0 (no impairment) with one exception. You guessed it, observation 12 had no impairment, but it did have an early death. In general, convergence does not occur when there is no overlap in the duration times for the two values of a dichotomous covariate. The coefficient for the covariate gets larger in magnitude with every iteration of the Newton-Raphson algorithm. In this example, observation 12 is the crucial observation that prevents this from happening.

What should be done here? When some observations are found to be unusually influential, a common strategy is to make sure that the data for those observations is correct. Often, such observations are found to contain recording or coding errors. (Of course, there’s a danger in selectively targeting some observations for error checking.) When no errors are detected, the researcher needs to be up front about the possible sensitivity of the model to one or to a small number of observations. For this example, the removal of observation 12 makes a strong effect even stronger (so strong that the algorithm does not converge), so there was no danger of drawing a misleading qualitative conclusion. The other question to ask is whether the model needs to be modified or elaborated to take the peculiar features of the influential observations into account. Perhaps there are other covariates that ought to be included, or perhaps there is some nonlinearity that is inadequately modeled.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.53.209