CHAPTER 4

Interpretation of a Fitted Proportional Hazards Regression Model

4.1 INTRODUCTION

The interpretation of a fitted proportional hazards model requires that we be able to draw practical inferences from the estimated coefficients in the model. We begin by discussing the interpretation of the coefficients for nominal (Section 4.2) and continuous (Section 4.3) scale covariates. In Section 4.4 we discuss the issues of statistical adjustment and the interpretation of estimated coefficients in the presence of statistical interaction. The chapter concludes with a discussion of the interpretation of fitted values from the model and covariate adjusted survivorship functions.

In any regression model, the estimated coefficient for a covariate represents the rate of change of a function of the dependent variable per-unit change in the covariate. Thus, to provide a correct interpretation of the coefficients, we must determine the functional relationship between the independent and dependent variables and we must define the unit change in the covariate likely to be of interest.

In Chapters 2 and 3 we recommended that the hazard function be used in regression analysis to study the effect of one or more covariates on survival time. We must first determine what transformation of the hazard function is linear in the coefficients. In the family of generalized linear models (i.e., linear, logistic, Poisson, and other regression models) this linearizing transformation is known as the link function (see McCullagh and Nelder (1989)). This same terminology can be applied to proportional hazards regression models.

The proportional hazards model can be used when the primary goal of the analysis is to estimate the effect of study variables on survival time. Suppose that we have a regression model containing a single covariate. Because the hazard function for the proportional hazards regression model is:

c04en001

it follows that the link function is the natural log transformation. We denote the log of a hazard function as g(t,x,β) = ln[h(t,x,β)]. Thus, in the case of the proportional hazards regression model, the log-hazard function is

(4.1)c04e001

The difference in the log-hazard function for a change from x = a to x = b is

(4.2)c04e002

Note that because the baseline hazard function, h() (t), is a component of the log hazard when x = a as well as when x = b , it disappears when we compute the difference in the log hazards. We also note that the difference in the log hazards does not depend on time. This is the proportional hazards assumption and it is examined in detail in Chapter 6, when we discuss methods for assessing model adequacy and assumptions.

The log hazard is the correct function to assess the effect of change in a co-variate. However, it is not as easily interpreted as the expression we obtain when we exponentiate (4.2), namely

(4.3)c04e003

The quantity defined in (4.3) is the hazard ratio, and it plays the same role in interpreting and explaining the results of proportional hazards regression that the odds ratio plays in a logistic regression.1 We will return to this point in the next section.

The results in (4.2) and (4.3) are important because they provide the method that must be followed to correctly interpret the coefficients in any proportional hazards regression model. The presence of censored observations of survival time in the data does not alter the interpretation of the coefficients. Censoring in the observations of time is an estimation issue dealt with when we constructed the partial likelihood function (see (3.17)) and, once we have accounted for the censoring, we can ignore it.

4.2 NOMINAL SCALE COVARIATE

We begin by considering the interpretation of the coefficient for a dichotomous covariate. Dichotomous or binary covariates occur regularly in applied settings. They may be truly dichotomous (e.g., gender) or they may be derived from continuous covariates (e.g., age greater than 65 years).

Assume that we have a model containing a single dichotomous covariate, denoted X, coded 0 or 1. The first step in interpreting the coefficient for X is to calculate the difference in the log hazard corresponding to a one-unit change in the covariate. This difference, from (4.2) yields:

c04en002

Thus, in the special case when the dichotomous covariate is coded zero and one, the coefficient is equal to the change of interest in the log hazard. We can exponentiate, following (4.3), the difference in log hazards to obtain the hazard ratio:

(4.4)c04e004

The form of the hazard ratio in (4.4) is identical to the form of the odds ratio from a logistic regression model for a dichotomous covariate. However, in the context of a proportional hazards model, it is a ratio of rates rather than of odds. To expand on this difference, suppose that we followed a large cohort of males and females for 60 months (5 years) and record if a subject “died” during this period. Subjects alive after 60 months of follow up are considered censored. In this hypothetical setting, we might be tempted to analyze the end-of-study binary variable, death during follow-up (yes = 1), using a logistic regression model. We should note that this binary variable is the censoring variable for the observation of time to death. Suppose the value of the odds ratio for gender (1 = female) is 2.0. This is interpreted to mean, under conditions where the odds ratio approximates the relative risk, that the probability of death by the end of the study is 2 times higher for females than for males. A hazard ratio of 2 obtained from (4.4) means that, at any time during the study, the per-month rate of death among males is twice that of females. Thus, the hazard ratio is a comparative measure of survival experience over the entire time period, whereas the odds ratio is a comparative measure of event occurrence at the study endpoint. They are two different measures, and the fact that they may, under certain circumstances, be of similar magnitude in an applied setting is irrelevant. Note that, if one is able to observe the survival time for all subjects, then the censoring variable is equal to one for all subjects and logistic regression cannot be used because the outcome is constant.

To expand on the interpretation of the hazard ratio for a dichotomous covariate, survival times were created for a hypothetical cohort of 5,000 females (sex = k = 1) and 5,000 males (sex = k = 0) with a theoretical hazard ratio of 2.0. Subjects whose survival time exceeded 60 months were considered censored at 60 months. At each month of follow up the (t) = dk (t)/nk(t), k = 0,1 and the hazard ratio number at risk, the number of deaths, the estimated hazard rates, hk(t) = dk(t)/nk(t), k = 0,1 and the hazard ratio c04ie001(t) = h1 (t)/h0 (t), were computed. These quantities are listed in Table 4.1 for the first 12 months and the last 12 months of this study. The hazard ratio is graphed for the 60-month study period in Figure 4.1. The average estimated hazard ratio, c04ie002, is included in Figure 4.1 for reference. The hazard rates and their ratios indicate that, during each of the 60 months of follow up, the death rate for females is approximately twice that of males. The scatter about 2.0 is due to the randomness in the number of deaths observed at each month.

The increase in the scatter over time in Figure 4.1 is due to the fact that the number in the risk sets decreases over time. By design of the example, all values of time greater than or equal to 60 months are censored, so hazards and their ratio are not estimable after this point.

In most applied settings, there will be too much variability in the point-wise estimators of the hazard ratios, c04ie003, for a figure like Figure 4.1 to be particularly informative about the value of the hazard ratio or to determine if it is constant over time.

Table 4.2 presents the results of fitting the proportional hazards model containing the dichotomous variable gender in the WHAS100 data. The point estimate of the coefficient is c04ie004 = 0.555 . Because gender was coded as 1 = female and 0 = male, we know from (4.3) that we can obtain the point estimator of the hazard ratio by exponentiating the estimator of the coefficient. In this example the estimate is:

c04en003

Like the odds–ratio estimator in logistic regression, the sampling distribution of the estimator of the hazard ratio is skewed to the right, so confidence interval estimators based on the Wald statistic (for the hazard ratio) and its assumption of normality may not have good coverage properties unless the sample size is quite large. Comparatively speaking, the sampling distribution of the estimator of the coefficient is better approximated by the normal distribution than the sampling distribution of the estimated hazard ratio. As a result, its Wald statistic–based confidence interval (for the coefficient) will have better coverage properties. In this case, we obtain the end-points of a 95 percent confidence interval for the hazard ratio by exponentiating the endpoints of the confidence interval for the coefficient. In the current example these are:

c04en004

Table 4.1 Partial Listing of the Number of Deaths, the Number at Risk, the Estimated Hazard Rate in Two Hypothetical Groups, and the Estimated Hazard Ratio at Time t

c04t001

Figure 4.1 Graph of the estimated hazard ratios and the mean hazard ratio ( HR = 2.0 ) for the hypothetical data from Table 4.1.

c04f001

Alternative confidence interval estimators have been studied, one of which is based on the partial likelihood. To date, this method has not been implemented in most software packages.

The interpretation of the estimated hazard ratio of 1.74 is that females die at about 1.74 times the rate of males, throughout the study period. The confidence interval suggests that ratios as low as 1.002 or as high as 3.031 are consistent with the observed data at the 95 percent level of confidence. Another way of expressing the hazard ratio that can be more meaningful to subject matter scientists is to describe it as a percentage increase over the null value of 1. In this example, one would say that the death rate among females is 74 percent larger than among males throughout the study period, and it could be as little as 0.2 percent larger or as much as 203 percent larger with 95 percent confidence.

Table 4.2 Estimated Coefficient, Standard Error, z-Score, Two-Tailed p-value, and 95% Confidence Interval Estimate for Gender for the WHAS100 Study

c04t002

As discussed in Chapter 3, the partial likelihood ratio test, the Wald test and the score test can be used to assess the significance of a coefficient. In the current example, the value of the partial likelihood ratio test is G = 3.75 , with a p–value equal to 0.053. The Wald test statistic is z = 1.97 , with a p–value equal to 0.049. Here one test is significant at the 5 percent level and the other just barely not so. The confidence interval for the hazard ratio does not include 1.0, a result consistent with the Wald test. In such cases, we believe that the best approach is to report all the results. The reader then has the option to evaluate the differing significance levels within the context of his/her own research objectives.

In the WHAS100 example in Table 4.2, the covariate value “1,” female, is associated with poorer survival experience. In many settings, the treatment of interest may result in improved survival experience. In these cases, the estimate of the coefficient is negative and the estimated hazard ratio is less than one. Our experience is that subject matter scientists seem to have more trouble interpreting the effect when it is protective. Hence we consider an example of this type, the three-drug versus two-drug regimens in the ACTG320 study. The results of fitting a proportional hazards model containing the indicator variable for the three-drug treatment are shown in Table 4.3.

The Wald test of the coefficient for treatment in Table 4.3 is highly significant. The partial likelihood ratio test (results not shown) is also significant with p < 0.01. The estimated hazard ratio and 95 percent confidence interval based on the results in Table 4.3 are, respectively, c04ie005 = 0.505 and (0.33,0.77). The interpretation of the estimated hazard ratio is that the rate of progression to AIDS or death among patients on the three-drug regimen is 0.505 times that of patients on the two-drug regimen; this could be as little as 0.77 times or as much as 0.33 times with 95 percent confidence. The percentage change interpretation is that the rate of progression of AIDS or death of patients on the three-drug regimen is 49.5 percent less than the rate of progression of AIDS or death of patients on the two-drug regimen; this decrease could be as little as 23 percent or as much as 67 percent with 95 percent confidence. Both forms of the interpretation are found in the subject matter literature. Which form is used in an applied setting should be based on which is thought to be more easily understood by the target audience for the research.

We also point out that, if one changes the 0–1 coding using the equation new = 1 – old, then the numeric output is the same but with the signs of the estimated coefficient and its confidence interval reversed. For example, if we coded the two-drug treatment as 1, then the estimated coefficient would be 0.684 with an estimated hazard ratio of 1.982. If one uses enough decimal places in the calculations then the results are exact, HRncw = l/HRold . This fact is useful because it often explains why an estimated coefficient has a sign opposite of what was expected (i.e., the coding has been reversed).

Table 4.3 Estimated Coefficient, Standard Error, z-Score, Two-Tailed p-value, and 95% Confidence Interval Estimate for Gender from the ACTG320 Study

c04t003

Note that Table 4.2 and Table 4.3 do not contain an intercept term. This is the price we pay for choosing the semiparametric proportional hazards model. The intercept, were one present in Table 4.2, would correspond to the log baseline hazard function, in this case for females. The implication of this in practice is that we cannot, from the regression output of a proportional hazards model, reconstruct group–specific hazard rates. Only hazard ratios can be estimated. If it is critical to have individual estimates of group–specific hazard rates, then we would use one of the fully parametric models discussed in Chapter 8.

Occasionally, the coded values for a dichotomous variable are not 0 and 1 (e.g., + 1 and –1 may be used instead). In this case, it may not be possible to obtain the estimator of the hazard ratio by simply exponentiating the estimator of the coefficient. One can always obtain the correct estimator by explicitly evaluating (4.2) and (4.3). If, as shown in (4.2) and (4.3), the two values are denoted as a and b, then the estimator of the hazard ratio is:

(4.5)c04e005

The end-points of a lOO(l–α) percent confidence interval estimator for the hazard ratio can be obtained by exponentiating the end-points of the confidence interval estimator for (a–b)β,

(4.6)c04e006

where |a – b| denotes the absolute value of (a – b).

If a nominal scale covariate has more than two levels, denoted in general by K, we must model the variable using a collection of K -1 “design” (also known as “dummy” or “indicator”) variables. The most frequent method of coding these design variables is to use reference cell coding. With this method, we choose one level of the variable to be the reference level, against which all other levels are compared. The resulting hazard ratios compare the hazard rate of each group to that of the referent group. Note that different software packages use different groups as the default referent group and options might need to be specified or adjusted if the default referent group is not the preferred referent group.

Table 4.4 Coding of the Three Design Variables for the Age Groups in the WHAS100 Study

c04t004

In Chapter 2, we considered an example in which age of subjects in the WHAS100 study was categorized into four groups < 60, [60–69], [70–79] and ≥ 80 . Our goal was to describe, qualitatively, how survival experience in the cohort changes with age, through plots of estimated survivorship functions and a log-rank test. We can continue along these same lines by fitting a proportional hazards model to these data. The estimated hazard ratios provide a convenient and easily interpreted summary measure of the comparative survival experience of the four groups.

The methods discussed in this example may be applied to any covariate with multiple groups. The coding for the three design variables based on the four age groups, using the youngest age group as the referent group, is shown in Table 4.4. The results of fitting a proportional hazards model using these three design variables are presented in Table 4.5.

The value of the partial likelihood ratio test for the overall significance of the coefficients is G = 15.32 and the p–value, computed using a chi-square distribution with three degree of freedom, is 0.002. This suggests that at least one of the three older age groups has a hazard rate significantly different from the youngest age group. The p–values of the individual Wald statistics indicate that the hazard rate in each of the two oldest groups is significantly different from that in the youngest (the referent) age group.

Before we can use (4.2) and (4.3) to obtain estimators of the hazard ratios, we need the equation for the log-hazard function. The log-hazard function, ignoring the log baseline hazard function, h0 (t), for the model fit in Table 4.5 is:

c04en005

The estimator of the hazard ratio comparing Age group 2 to Age group 1 is obtained by first calculating the difference in the estimators of the log-hazard functions, (4.2),

c04en006

Exponentiating the result, we obtain:

c04en007

We obtain the estimators of the other two hazard ratios by proceeding in a similar manner, and these are:

c04en008

and

c04en009

We calculate the value of the estimates in the example, shown in the second column of Table 4.6, by exponentiating the values of the coefficients, from Table 4.5.

When reference cell coding is used to create the design variables, the estimators of the hazard ratios comparing each group to the referent group are obtained by exponentiating the respective estimators of the coefficients.

We construct confidence interval estimators of the hazard ratios by exponentiating the endpoints of the confidence intervals for the individual coefficients. For example, the endpoints of the 95 percent confidence interval estimate for HR(Age group 3 versus l) shown in Table 4.6 are:

c04en010

Similar calculations yield the endpoints for the other two confidence interval estimates shown in Table 4.6.

The hazard ratios in Table 4.6 suggest: (1) subjects in their sixties are dying at a rate that is about the same as subjects younger than 60, (2) subjects in their 70’s are dying at a rate that is about 2.7 times greater than subjects younger than 60 and (3) subjects 80 or older have a mortality rate that is approximately 3.5 times greater than subjects younger than 60.

Given the similarity of the hazard ratios comparing each of the two older age groups to the referent group, it would make sense to test whether the survival experience in these two groups differs. We can estimate their hazard ratio and determine whether it is different from 1.0. We do this by using the general approach in (4.2) and (4.3). The specific difference in the hazard functions for the two groups is:

c04en011

Table 4.5 Estimated Coefficients using Referent Cell Coding, Standard Errors, z-Scores, Two-Tailedp–values, and 95% Confidence Intervals Estimates for Age Categorized into Four Groups from the WHAS100 Study

c04t005

Table 4.6 Estimated Hazard Ratios (HR) and 95% Confidence Intervals Estimates for Age Categorized into Four Groups from the WHAS100

Age Group HR 95% CIE
>60 1.00
[60 - 69] 1.05 0.38, 2.90
[70 - 79] 2.68 1.12,6.42
>80 3.54 1.57,7.98

The estimator of the hazard ratio is:

c04en012

and is equal to exp( 1.263–0.986)= 1.319. To obtain a confidence interval, we need an estimator for the variance of the difference between the two coefficients. Specifically:

c04en013

where c04ie006 denotes the estimator of the variance of the estimator in the parentheses and c04ie007 denotes the estimator of the covariance of the two estimators in the parentheses. These estimates may be obtained from most software packages by requesting the estimated covariance matrix of the estimated coefficients. Table 4.7 presents the covariance matrix for the estimated coefficients for the three age groups.

Table 4.7 Estimated Variances and Covariances for the Three Estimated Coefficients in Table 4.5

c04t006

The estimated variances and covariance needed are c04ie008 = 0.1984, c04ie009 = 0.1727, and c04ie010 = 0.1260. The estimate of the variance of the difference in the two coefficients is:

c04en014

and the estimated standard error is:

c04en015

The end-points of the 95 percent confidence interval estimate are:

c04en016

The confidence interval includes 1.0, suggesting that the hazard rates for the two older age groups may, in fact, be the same.

Instead of using the confidence interval, we could test the hypothesis of the equality of two coefficients via a Wald test. Many software packages allow the user to test whether specified contrasts of model coefficients are equal to zero. This is a convenient feature, especially when contrasts of interest are more complicated than simple differences. The Wald test for the contrast c04ie011 is:

c04en017

and the two-tailed p–value computed from the standard normal distribution is 0.42. Because the p–value is greater than 0.05, we fail to reject the hypothesis that the two coefficients are equal and conclude that there is no evidence that the death rates in the two age groups differ.

The test for a general contrast among the K–1 coefficients for a nominal scaled covariate with K levels is described as follows. Let the vector of estimators of the coefficients be denoted:

c04en018

and the estimator of the covariance matrix be denoted c04ie012. Let the vector of constants specifying the contrast be denoted:

c04en019

where the sum of the constants is zero. The single degree of freedom Wald test for the contrast is:

(4.7)c04e007

and the two–tailed p–value is obtained using the standard normal distribution. Most software packages will report the square of the Wald test and use the chi-square distribution to calculate the p–value. The equivalence of these two approaches follows from the fact that the distribution of the square of a N(0,l) random variable follows a χ2distribution with one degree of freedom.

In the WHAS100 study, it may be of interest to determine whether the log hazard ratio of the second age group is equal to the average of the log hazard ratios of two oldest age groups. The vector of constants for this contrast is c’ = (1.0,–0.5,–0.5), the vector of estimated coefficients is given in Table 4.5, and the covariance matrix is shown in Table 4.7. We used STA TA to perform the calculations, but other software packages (e.g., SAS) could have been used. The value of the test statistic is Q = 6.69 with a p–value equal to 0.01. We conclude that the oldest two age groups have an average hazard rate significantly greater than the average hazard rate of the younger two age groups.

The method of using a contrast to compare coefficients can be especially useful when trying to pool categories of a nominal scale covariate recorded with more levels than the data can support (e.g., categories with few events). Considerations regarding meaningful interpretations are of primary importance in deciding which categories to combine, but contrasts may be used to judge whether the hazard rates of clinically similar groups are statistically similar.

Referent cell coding as used above is the most frequently applied scheme for coding design variables; however, it is just one of many possible methods. An alternative is deviation from means coding. This type of design variable coding may be used when one simply needs an overall assessment of differences in hazard rates. To illustrate the method, we apply it to the four age groups in the WHAS100 study. This coding is obtained by replacing the first row of zeros in Table 4.4 with a row in which each value is equal to –1. The resulting estimated coefficient for an age group estimates the difference between the log hazard of the group and the arithmetic mean of the log hazards of all K groups. The exponentiated estimated coefficient provides the ratio of the hazard rate of the particular group to the geometric mean of the hazard rates of all A’ groups.

The results of fitting a proportional hazards model using the deviation from means coding are shown in Table 4.8. The value of the partial likelihood ratio test for the overall significance of the coefficients is identical to that obtained using reference cell coding and is G = 15.32 with a p–value, computed using a chi-square distribution with three degree of freedom, of 0.002. The value of -0.527 for the estimated coefficient of design variable AGE_2 is equal to the estimate of the difference between the log-hazard rate for age group 2, (60, 69], and the estimate of the mean log-hazard rate over all four groups (including group 2). The Wald statistic has a p–value of 0.089, indicating that the difference between the log-hazard rate for this age group and the average log-hazard rate is significant at between the 5 and 10 percent level. The coefficient for group 4 is 0.689 and its Wald statistic has p - 0.002. Thus, the log-hazard rate for this age group is significantly larger than the average log-hazard rate.

The estimated difference between the log hazard for the first age group and the average log-hazard rate is the negative of the sum of the coefficients in Table 4.8 and is –0.574. The easiest way to obtain an estimate of its standard error, Wald statistic, etc., is to make a small change in the coding of the design variables and refit the model. We merely switch the row coded –1 with any other row. We do not recommend that hazard ratios be reported when using deviation from means coding, because the ratio cannot be interpreted in the same manner as the ratio from referent cell coding. The comparison is not a comparison of two distinct groups, but rather of one group to the geometric mean hazard rate of all groups combined.

Many other methods for coding design variables are possible. For example, coding that compares each group to the next largest group or each group to the average of the higher groups. These methods tend to be appropriate in special circumstances and will not be discussed further in this text. In our experience, the method of referent cell coding, perhaps followed by contrasts, has provided a useful and informative analysis in most circumstances.

Table 4.8 Estimated Coefficients Using the Deviation from Means Coding, Standard Errors, z-Scores, Two-Tailed p-values, and 95% Confluence Intervals Estimates for Age Categorized into Four Groups from the WHAS100 Study

c04t007

4.3 CONTINUOUS SCALE COVARIATE

At first glance it might seem that interpreting the coefficient for a continuous co-variate would be easier than that of a nominal scale variable because indicator variables and coding schemes need not be introduced. Looking more deeply reveals that this is not necessarily true. Before we can use (4.2) and (4.3) to obtain an estimator of a hazard ratio for a continuous covariate, we must address two issues. First and foremost, we must verify that we have included the variable in its correct scale in the model. In this section, we will assume that the log hazard is linear in the covariate of interest. Methods to assess the scale are presented in Chapter 5. Second, we must decide what a clinically meaningful unit of change in the covariate is. Once these two steps are accomplished we may apply (4.2) and(4.3).

We illustrate the method using the WHAS100 study and age as the covariate. The results of fitting a proportional hazards model containing age are shown in Table 4.9. The estimated coefficient in Table 4.9 gives the change in the log hazard corresponding to a 1 -year change in age. Often a 1 -year change in age is not of clinical interest. The investigators conducting the study may, for example, be more interested in a 5-year change in age.

We obtain the correct change in the log-hazard function for a change of c units in a continuous covariate by using (4.2) and (4.3) with a = x + c and b = x. This yields the following change in the log hazard

(4.8)c04e008

The change is simply equal to the value of the change of interest times the coefficient. The estimator of the hazard ratio is

(4.9)c04e009

and the end-points of a 100(l – α) percent confidence interval estimator of the hazard ratio are

(4.10)c04e010

Applying (4.9) and (4.10) for a 5-year change in age in the WHAS100 study, we obtain an estimated hazard ratio of

c04en020

Table 4.9 Estimated Coefficient, Standard Error, z-Score, Two-Tailed p-value, and 95% Confidence Interval for Age in the WHAS100 Study

c04t008

and the endpoints of a 95 percent confidence interval are

c04en021

Alternatively, we could have calculated the endpoints of the 95 percent confidence interval by multiplying the endpoints in Table 4.9 by 5 and then exponentiating. We suggest, for continuous covariates, that the hazard ratio for the clinically interesting unit of change, along with its confidence interval, be reported in any table of results. The unit of change should be indicated in the table heading or in a footnote.

The interpretation of an estimated hazard ratio of 1.26 is that the hazard rate increases by 26 percent for every 5-year increase in age and is independent of the age at which the increase is calculated. The independence of the increase in age is due to the fact that the log hazard was assumed to be linear in age and subtracts itself from the calculation in (4.8). The confidence interval estimate suggests that an increase in the hazard rate of between 12 and 42 percent is consistent with the data at the 95 percent level of confidence.

As is the case with a nominal scaled covariate, our experience is that subject matter investigators are more comfortable interpreting the effect of a continuous, linearly modeled covariate when increasing values are associated with poorer survival experience (i.e., positive coefficients and estimated hazard ratios that exceed one). The covariate CD4 count in the ACTG320 study provides an example of a protective effect. The results of fitting this model are shown in Table 4.10.

The coefficient in Table 4.10 estimates the decrease in the log hazard rate for every increase of one unit in the CD4 count. With a range in CD4 count from 0 to 392, a change of one is not clinically meaningful. The three quartiles are, respectively, 23,75 and 136.5. The estimated standard deviation is approximately 70. In settings where 2, 5, or 10 do not provide a meaningful change, one might consider alternative, choices such as one-half the inter quartile range or the standard deviation. As an example, the estimated hazard ratio for a one standard deviation increase in CD4 count is

c04en022

Table 4.10 Estimated Coefficient, Standard Error, z-Score, Two-Tailed p–value, and 95% Confidence Interval Estimates for CD4 Count in the ACTG320 Study

c04t009

The end-points of a 95 percent confidence interval estimate are

c04en023

The interpretation is that the rate of AIDS-defining diagnosis or death is estimated to decrease by 67.4 percent for every increase of 70 in the CD4 count and a decrease of between 54.1 and 76.8 percent is consistent with the data at the 95 percent level of confidence.

In summary, it is important to remember that the interpretation of the estimated hazard ratio for a continuous covariate depends on having included it in the model in the correct scale (linear in the examples in this section) and adhering to the basic premise of a proportional hazards model. Methods for checking these assumptions are considered in detail in Chapters 5 and 6, respectively.

4.4 MULTIPLE-COVARIATE MODELS

One of the primary reasons for using a regression model is to include multiple covariates to adjust statistically for possible imbalances in the observed data before making statistical inferences. This process of adjustment has been given different names in various fields of study. In traditional statistical applications, it is called analysis of covariance, while in clinical and epidemiological investigations it is often called control of confounding. A statistically related issue is the inclusion of higher-order terms in a model representing interactions between covariates. These are also called effect modifiers. In this section we discuss the strengths and limitations of statistical adjustment and inclusion of interactions and establish a set of basic guidelines that we employ when discussing model development in the next chapter.

Suppose for the moment we are in a setting where we have two variables: a primary risk factor or treatment variable, denoted as d, and another covariate, denoted x. To simplify things, assume that the covariate x is significantly associated with the outcome. In addition to being a contributor to the model itself, the covariate x could be: (1) a confounder of the association of the primary covariate of interest, d, with the outcome; (2) an effect modifier of the association and (3) neither a confounder of the association of the primary covariate of interest, d, with the outcome nor an effect modifier of the association. We use examples from various data sets to demonstrate each of these possibilities. As we show in the series of examples, determining the status of the covariate, x, involves fitting three models: ( 1 ) the model containing d but not containing x; (2) the model containing both d and x and (3) the model containing d, x and their interaction x×d.

Before we consider the examples, we continue with the two-variable setting, as it is helpful conceptually to understand and graphically to describe confounding and interaction as they are manifested in a regression model.

Suppose that our primary risk factor, d, has two levels (coded 0 = absent and 1 = present) and that our primary analysis goal is to estimate the hazard ratio for d. The proportional hazards model that contains only d has log-hazard function

c04en024

The difference in the log-hazard at the two levels of d is

(4.11)c04e011

Suppose that we have a second model that contains both d and x. The log-hazard function for this model is

(4.12)c04e012

The adjusted difference in the log-hazard from (4.12) is

(4.13)c04e013

The results shown in (4.11) and (4.13) indicate that we have two estimators of the effect of our risk factor: (1) a so-wcalled crude or unadjusted estimator c04ie013, obtained from (4.11) by fitting the model that does not include x, and (2) an adjusted estimator c04ie014 obtained from (4.13) by fitting a model that includes x. If the two estimators are similar, then x is not a confounder of the association of d and survival time, as measured by the difference in the log-hazard. If the estimators are different, then adjustment is needed and the variable x may be a confounder of the association. The extent of adjustment, or difference, betweenc04ie015 andc04ie016 is a function of the difference in the distribution of x within the two groups defined by d and the magnitude of c04ie017, the association between x and survival time.

Suppose that the model containing d and x, (4.12), is the correct model, and denote the average value of x among subjects with d = 1 as c04ie018 and among those with d = 0 as c04ie019 An approximation of the average log-hazard functions [see Fleming and Harrington (1991) page 134 for an exact expression] for the two groups defined by d is

c04en025

and

c04en026

If we use the results from the fitted crude and adjusted models and take the difference between these two expressions, the crude or unadjusted log-hazard ratio is approximately

(4.14)c04e014

Thus, the crude estimator will be approximately equal to the adjusted estimator if the difference in the mean of x of the two groups defined by d is zero or if the coefficient for x is zero. The two estimators will differ if at least one of the two is large or both are moderate in size. The above two-variable model is based on a dichotomous and a continuous covariate. However, it may be generalized to co-variates on other measurement scales, for example d continuous.

The magnitude of the confounding by x expressed in (4.14) is on the scale of the coefficients or difference in the log-hazard. Hence, we believe that any measure of difference in the two estimators of effect be defined using estimated coefficients. Others may prefer to use exponentiated coefficients, i.e., hazard ratios. The measure of difference in the two coefficients we prefer to use is the percentage change

(4.15)c04e015

where c04ie020 enotes the estimator from the model that does not contain the potential confounder (the smaller model) and c04ie021 denotes the estimator from the model that does include the potential confounder (the larger model). This particular measure of confounding works well with our favorite approach to model building, discussed in Chapter 5.

If we assume that (4.14) is correct, then the percentage change in terms of the model is

(4.16)c04e016

The right hand side of (4.16) expresses the amount of confounding as a percentage of the adjusted estimator. In practice, one would evaluate only (4.15). The expressions in (4.14) and (4.16) are provided as a tool to assist in explaining why the crude and adjusted estimators could be different. We explore this in more detail in the examples that follow our general discussion.

Under the model in (4.12), once we include the covariate x in the model, the adjusted estimate of effect for d is constant, i.e., does not depend on the value of x. Recall that the fact that it also does not depend on time is due to the assumption that the hazards are proportional. We say that the covariate x is an effect modifier if the effect of d does depend on the value of x. For example, suppose d represents a treatment and x represents age. If the effect of the treatment varies with age, then age is said to be an effect modifier. The simplest, and most frequently used, way to determine whether x is an effect modifier is to include the product term x × d in the model. Specifically

(4.17)c04e017

Using the model in (4.17) the difference in the log hazard function at the two levels of d is

(4.18)c04e018

which clearly depends on the value of x through the coefficient β3. Thus for the covariate x to be an effect modifier, the coefficient, β3, must be different from zero. Hence, we see that, while confounding by x depends on two quantities [see (4.14)], its effect modification depends only on the magnitude of a single coefficient and thus is typically examined via the Wald test for the interaction coefficient or the partial likelihood ratio test comparing the model in (4.17) to the model in (4.12). It may be the case that prior research has clearly demonstrated that x is an effect modifier and, under these circumstances, empirical evidence is less important for justifying the inclusion of the product term d × x in the model.

Once we make the decision that there is evidence that a covariate is an effect modifier, discussion of its role as a confounder is no longer relevant. In particular, the estimator of β1 in (4.13) provides an “x” adjusted estimator of the effect of d that holds for all values of x. However, the main effect coefficient for d, β1, in(4.18) provides an estimator of effect only at a single specific covariate value (which, for a non-centered continuous covariate, is a value likely not to be clinically plausible), but not for other values of x. The point here is that a Δc04ie022% calculation comparing estimated coefficients from the models in (4.12) and (4.17) is of no interest.

The estimator of the hazard ratio from (4.18) is

(4.19)c04e019

End-points of its confidence interval estimator are first calculated on the log-hazard scale and then exponentiated. To obtain these we use the estimator of the standard error

(4.20)c04e020

The end-points of the 100(l –α)percent confidence interval on the log hazard ratio scale are

(4.21)c04e021

and the end-points for the corresponding confidence interval for the estimated hazard ratio are

(4.22)c04e022

Now we consider examples to illustrate each of the confounder and effect modifier possibilities listed in the second paragraph. Because a covariate may or may not be a confounder, and may or may not be an effect modifier (but never both simultaneously), we present the results of three fitted models for each example: ( 1 ) a model containing only the risk factor of interest, referred to as the crude model; (2) a model containing the risk factor and the potential adjustment variable, referred to as the adjusted model and (3) a model containing main effect terms for both variables and one that is the arithmetic product of the two, referred to as the interaction model.

We begin with an example from the German Breast Cancer Study (see Section 1.3) where the risk factor of interest is hormone therapy and the potential confounder (or, perhaps, effect modifier) is the size of the tumor. The results of fitting the three models are shown in Table 4.11.

The results for the crude model indicate that hormone therapy significantly reduces the rate of cancer recurrence; the estimated coefficient is negative with P = 0.004 . This effect could be influenced by imbalances in the distribution of another variable. The adjusted model in Table 4.11 adds tumor size to the model. In this case, the calculated value of the amount of confounding due to differences in the distribution of size in the two hormone therapy groups is

c04en027

We see that the crude estimator is only 2.41% smaller than the adjusted estimator. The results in Table 4.11 show that size is a significant risk factor with p < 0.001, hence we conclude [based on (4.14)1 that the mean size of the tumor in the two-hormone therapy groups must be quite similar. In fact they are, with values respectively of 29.6 mm and 28.8 mm. In this case, the estimated hazard ratios for both the crude and adjusted models are quite similar (approximately 0.7). Thus use of hormone therapy is estimated to reduce the rate of tumor recurrence by about 30 percent.

The results for the interaction model show that the p–value for the Wald test for the inclusion of the product variable is not significant, p = 0.928 . Although we don’t present the details here, the p–value for the likelihood ratio test comparing the adjusted and interaction models (results not shown) is also 0.928. Hence, there is no statistical evidence of tumor size modifying the effect of hormone therapy in these data. The fact that size is such a significant risk factor in its own right would lead us to prefer the adjusted model.

To provide an example of a covariate that is a confounder, we turn to the UIS data described in Section 1.3, whose variables are listed in Table 1.3. The reader who is not yet familiar with this study is encouraged to read the short description on pages 10-12. The outcome in this study is the time from randomization to treatment to self-reported return to drug use. The primary risk factor for this example is a dichotomous variable, drug, obtained by recoding the variable history of IV drug use, ivhx in Table 1.3, into none versus any use. The potential con-founder (or, perhaps, effect modifier) is age of the subject at randomization to treatment. The results of fitting the three models are shown in Table 4.12.

Table 4.11 Estimated Coefficients, Standard Errors, z-S cores, Two-Tailed p-values, and 95% Confidence Interval Estimates for Three Models Fit to the German Breast Cancer Study Data, n = 686

c04t010

The results for the crude model indicate that having any history of IV drug use significantly increases the rate of returning to drug use; the estimated coefficient is positive with P = 0.001. The adjusted model shows that increasing age at randomization significantly decreases the rate of returning to drug use; the estimated coefficient is negative with P = 0.001.

The calculated value of the amount of confounding due to differences in the distribution of age in the two drug use groups is

c04en028

We see that the crude estimator is 26.88 percent smaller than the adjusted estimator. At this point we need to consider a criterion for how much change in an estimated coefficient is important. This decision is more driven by research experience than statistical thresholds. In general, our preference is to use 20 percent as a cutoff value. As the reader will see in examples we will consider in later chapters, we do not adhere strictly to this value, varying it up and down as common sense dictates. However, for now we use 20 percent. Thus we say that age at randomization confounds the drug history effect. Age is quite significant, but there could also be a difference in the mean ages of the two drug history groups. In fact this is the case the mean age of those without a history is 29.64 whereas the mean age for those with a history is 34.05. To explore this further we evaluate (4.14) as follows

c04en029

Here the approximation to the crude estimator from (4.14) is quite close. Thus, assuming the fitted model is correct, the confounding can be explained by the significance of age at randomization and the fact that subjects with a history of IV drug use are, on average, five years older than subjects without such history.

Table 4.12 Estimated Coefficients, Standard Errors, z-Scores, Two-Tailed p-values, and 95% Confidence Interval Estimates for Three Models Fit to the UIS Data, n = 605

c04t011

The results for the interaction model in Table 4.12 show that the p–value for the Wald test for the inclusion of the product variable is not significant, p = 0.403. The p–value for the likelihood ratio test comparing the adjusted and interaction models (results not shown) is 0.402. Hence, there is no statistical evidence that age at randomization modifies the effect of history of IV drug use in these data.

Because age is a strong confounder, we would use the estimated coefficient for drug in the adjusted model to estimate the effect of history of IV drug use. The estimated hazard ratio from this model is 1.55, suggesting that a history of IV drug use increases the rate of return to drug use by 55 percent.

To provide an example of a variable that is an effect modifier, we turn to the Worcester Heart Study data, WHAS500, where we consider gender as the risk factor of interest and age as the other covariate. The results of the three fitted models are shown in Table 4.13. The results for the crude model indicate that females, gender = 1, are dying, after hospitalization for a heart attack, at a rate that is significantly greater than males. The estimated coefficient is positive and p < 0.001 . The results for the adjusted model show that, when age is added to the model, gender is no longer significant. When we calculate the percentage change in the coefficient for gender after adding age to the model, we obtain

c04en030

This states that the crude estimate is approximately 682 percent larger than the adjusted estimate. This huge number is due to the fact that the adjusted estimate is almost zero. Regardless of the magnitude of the number, the effect of gender more or less disappears when we add age to the model. Why is this the case? The mean age of females is 74.72 versus 66.60 for males and, evaluating (4.14), we obtain

c04en031

Hence, we see that a large portion of the crude effect is due to the fact that age is highly significant in the adjusted model and the difference in the means is large.

When we examine the interactions model in Table 4.13, we see that the Wald test for the product variable is significant with p = 0.015. The p–value for the likelihood ratio test comparing the interaction model to the adjusted model (details not presented) also has p = 0.016 . Hence, age is a significant modifier of the effect of gender in these data. Because age is an effect modifier, we cannot interpret the estimated coefficient for gender in the interactions model in the same way as we do in the adjusted model. Namely, the value of –0.066 provides an age adjusted estimate of effect of female gender, but 2.329 is the estimate of the effect of female gender only for participants with age equal to zero years. Because the effect of female gender depends on age, it is completely inappropriate to use the adjusted model to provide an age adjusted estimate of effect and pretend that it applies to all ages. We must use the interaction model and the results in (4.18) – (4.22) to provide individual age specific estimates of the effect of female gender.

When we examine the interactions model in Table 4.13, we see that the Wald test for the product variable is significant with p = 0.015. The p–value for the likelihood ratio test comparing the interaction model to the adjusted model (details not presented) also has p = 0.016 . Hence, age is a significant modifier of the effect of gender in these data. Because age is an effect modifier, we cannot interpret the estimated coefficient for gender in the interactions model in the same way as we do in the adjusted model. Namely, the value of –0.066 provides an age adjusted estimate of effect of female gender, but 2.329 is the estimate of the effect of female gender only for participants with age equal to zero years. Because the effect of female gender depends on age, it is completely inappropriate to use the adjusted model to provide an age adjusted estimate of effect and pretend that it applies to all ages. We must use the interaction model and the results in (4.18) – (4.22) to provide individual age specific estimates of the effect of female gender.

The relevant estimator of the log hazard in the presence of an interaction is shown in (4.17), and the log-hazard ratio in (4.18). Because the estimator depends on a continuous covariate, age in this example, we have a choice of using a plot and/or a table. We find that a plot of the log hazard as well as one of the relevant log-hazard ratio can be useful for understanding the source of the interaction, but often these plots are not specific enough to provide the kind of detailed information about the actual values of estimated hazard ratios that subject matter investigators are interested in knowing.2 Hence, we typically use the plot to identify key values of the covariate and prepare a table for presentation to the investigators, that contains the point and confidence interval estimates of the hazard ratio at these values.

Following this suggested approach we present, in Figure 4.2, a plot of the estimated log hazard from the interaction model in Table 4.13. Under a model with no interaction, the lines for the log hazards for two genders are parallel and the vertical distance between them is the age adjusted log hazard ratio. The plot of the lines for the interaction model shows the departure from being parallel and the significance of the interaction coefficient in Table 4.13 tells us that the two lines are statistically significantly different. The two lines intersect at approximately 77 years. The fact that the line for females lies above that of males for ages less than 77 means that they are dying at a rate greater than that of males. The reverse is true for ages greater than 77. The estimated age-specific log hazard ratio is the vertical distance between the two lines at any specified age.

Table 4.13 Estimated Coefficients, Standard Errors, z-Scores, Two-Tailed p~values, and 95% Confidence Interval Estimate for Three Models Fit to the Worcester Heart Attack Data, /n = 500

c04t012

Figure 4.2 Plot of the estimated log hazard for males and females from the interaction model in Table 4.13 versus age.

c04f002

In Figure 4.3 we plot the estimated log hazard ratio given by (4.18) and 95 percent pointvvise confidence limits from (4.21) using estimated coefficients from the interaction model in Table 4.13. In addition, we have added a scale on the right that indicates the corresponding value of the estimated hazard ratio. We should note that the confidence bands in Figure 4.3 have the same hyperbolic shape as confidence bands for a fitted univariable linear regression model. This is a consequence of the parametric form of (4.18), which looks like a linear regression model and the fact that the confidence bands are narrowest at approximately the overall mean age, 68 years in these data.

In Figure 4.3 the horizontal line at log hazard equal to zero and hazard ratio equal to one corresponds to no gender effect. This line is contained within the confidence limits between the ages of 58 and 89. This tells us that the rate of death among females is not significantly different from males between the ages of 58 and 89. Females have a significantly higher rate of death for age less than 65 and the reverse is true for age greater than 85. Based on the above graph, Table 4.14 presents estimated hazard ratios and confidence intervals for ages 40, 50, 60, 65, 85 and 90. Obviously we have fit a simple model that ignores many other known risk factors for death following an MI. Keeping this in mind, the results in Table 4.14 show that the rate of death among 40–year-old women is estimated to be three times that of males and drops to a 1.63-fold increase at age 60. Females over the age of 90 are dying at a rate estimated to be approximately 34% less than that of males. This interpretation must be tempered by the fact that only 16 men and 13 women of the 500 subjects are 90 or older.

Figure 4.3 Plot of the age specific estimated log hazard ratio for female gender and 95 percent pointwise confidence limits versus age for the interaction model in Table 4.13

c04f003

We return to the German Breast Cancer Study data for our final example where, again, the covariate is an effect modifier. Again we use hormone therapy as the covariate of interest and the modifying covariate is the count of the number of nodes involved. The results of fitting three models containing these two covari-ates are presented in Table 4.15.

Table 4.14 Age-Specific Estimated Hazard Ratios for Gender and 95% Confidence Interval Estimates from the Interactions Model in Table 4.13

Age c04f003 95% CIE
40 3.04 1.139, 8.107
50 2.24 1.061, 4.736
60 1.65 0.977, 2.799
70 1.42 0.928, 2.174
80 0.77 0.564, 1.059
90 0.66 0.448, 0.983

The results for the crude model in Table 4.15 are the same as those for the crude model in Table 4.11, which indicate that use of hormone therapy significantly reduces the rate of recurrence. The results for the adjusted model in Table 4.15 show that increased nodal involvement significantly increases the rate of recurrence. The estimated coefficient for hormone therapy is only 2 percent different from the crude estimate. This is due to the fact that the mean number of nodes involved for the two treatment groups are nearly identical: 5.1 for those receiving therapy and 4.9 for those not receiving therapy. Hence, we conclude that nodal involvement does not confound the estimate of the effect of hormone therapy in these data.

The Wald test for the interaction coefficient in Table 4.15 is significant with p = 0.011. Although not shown, the likelihood ratio test is also significant ( p = 0.015 ). For this example we show only the plot of the node specific estimated log hazard ratio with its 95 percent confidence bands in Figure 4.4.

As in Figure 4.3, the horizontal line in Figure 4.4 represents no effect. Furthermore, we see that the no effect line is contained within the confidence bands for 10 or more nodes involved. Hence use of hormone therapy significantly reduces the rate of recurrence only when 9 or fewer nodes are involved. While the number of nodes involved ranges from 1 to 51, 85 percent of the subjects had 9 or fewer nodes involved, with 43 percent having one or two. This suggests that we should focus our attention in the low end of the range, and to this end, we present node-specific estimated hazard ratios and confidence intervals for 1, 3, 5, 7 and 9 nodes in Table 4.16. The point estimates of the hazard ratios demonstrate that the benefit of hormone therapy ranges from a 43 percent reduction in the rate of recurrence among women with one node involved to a 23 percent reduction for women with 9 nodes involved.

Each of the examples in this section involved only two covariates. In subsequent chapters we consider more realistic, and thus more complicated, multi-variable models containing covariates that confound and/or interact with other model covariates. Regardless of how complicated a model might be, the simple examples in this section provide the basic paradigm for estimating and interpreting hazard ratios. We must remember that model-based inferences are only as good as the model upon which they are based. Hence, it is vital that one pays close attention to the all the model building and model assessment details discussed in Chapters 5 and 6 before using a fitted model to estimate hazard ratios.

Table 4.15 Estimated Coefficients, Standard Errors, z-Scores, Two-Tailed p-values, and 95% Confidence Interval Estimate for Three Models Fit to the German Breast Cancer Study Data, n = 686

c04t013

Figure 4.4 Plot of the node-specific estimated log-hazard ratio for hormone therapy and 95 percent pointwise confidence limits versus nodes for the interaction model in Table 4.15

c04f004

Table 4.16 Node-Specific Estimated Hazard Ratios and 95% Confidence Interval Estimate from the Interactions Model in Table 4.15

Nodes c04f003 95% CIE
1 0.57 0.419,0.767
3 0.61 0.466, 0.803
5 0.66 0.513,0.850
7 0.71 0.558,0.911
9 0.77 0.598,0.990

4.5 INTERPRETING AND USING THE ESTIMATED COVARIATE-ADJUSTED SURVIVAL FUNCTION

Methods for estimating the survival function following the fitting of a proportional hazards model were presented in Section 3.5. The key step presented in that section is the estimation of the baseline survival function,c04ie022, shown in (3.40). This estimator may be combined with the estimators of the coefficients in the model using (3.37) to obtain the estimator of the survival function, adjusting for the covariates, as follows

(4.23)c04e023

All software packages allow the user to request that the estimator of the baseline survival function be calculated and saved. This estimator may be used to derive other functions of survival time. For example, the estimator in (4.23) is essential for graphical description of the results of the analysis and for other analyses, such as model assessment. We discuss graphical methods and estimation of quan-tiles and their interpretation in this section and model assessment in Chapter 6.

We begin with an example from the German Breast Cancer Study fitting a model containing the dichotomous treatment variable, hormone therapy. The model containing only hormone therapy was fit as the crude model in two examples in the previous section, see Table 4.11 and Table 4.15. For convenience, we repeat the results of this fit in Table 4.17.

The estimator of the baseline survival function for this model is an estimator of the survival function for hormone = 0. If we request that the baseline survival function be computed as part of the analysis, then the software evaluates (3.40), denoted

(4.24) c04ie023

for each subject in the study, regardless of their survival status or hormone use. It follows from (3.40) that the estimator c04ie023 is constant between observed survival times. Thus, the estimated value for subjects who were censored is equal to the value at the largest observed survival time for which they were still at risk.

Table 4.17 Estimated Coefficients, Standard Errors, z-Scores, Two-Tailed p–values, and 95% Confidence Interval Estimate for a Model Fit to the German Breast Cancer Study Data, n = 686

c04t014

We can compute an estimate of the survival function for hormone = 1 by using the previously calculated value of the baseline survival function and evaluating

(4.25)c04e025

where the value of the coefficient for hormone is obtained from Table 4.17. The graphs of the two estimated survival functions, and (4.25), are shown in Figure 4.5. The plot has been drawn with steps connecting the points to emphasize the fact that the estimator is constant between observed survival times. It follows from and (4.25) that each function has been plotted at exactly the same n = 686 values of time. The shape of the two curves is a consequence of the proportional hazards assumption. The ratio of the hazards at each point in Figure 4.5 is forced to be equal to 0.69 = exp(–0.364).

One of the treatment specific Kaplan-Meier estimators may have a plot similar in appearance to Figure 4.5. However, there is an important distinction, as the Kaplan-Meier estimators use only the data in each hormone use group and do not assume the hazards are proportional.

We can also think of the curves plotted in Figure 4.5 as being like “fitted” or “predicted” regression lines. Here the “prediction” is on the survival probability scale. The upper curve predicts or estimates the survival experience among those having hormone therapy if: (1) the estimate of the baseline survival function correctly describes the survival experience in the no-hormone therapy group (the lower curve), and (2) the proportional hazards model is correct.

In the GBCS data, the observed range of times is nearly identical for the two hormone groups. This may not always be the case. For example, consider a study where the survival experience is much poorer in an “exposed” group than an un-exposed group. In this case, we might expect to see few long follow-up times in the exposed group and it is possible that the range of times would be less than that of the unexposed group. Yet a plot of the estimated survival function for the exposed group, using the estimated baseline survival function from the fitted proportional hazards model, would extrapolate this group’s survival experience beyond their observed range. This extrapolation can be avoided by simply restricting the plot for each group to the observed times in the group. However, in reading someone else’s analyses in the literature, it may be quite difficult to determine if there has been an inappropriate extrapolation in plots of proportional hazards co-variate adjusted survival curves. We leave as an exercise plotting the two estimated survival functions in Figure 4.5 over their specific times, e.g., a plot with a total of 686 points.

If the observed range of survival times is comparable for each group, as it is in our example, we recommend a plot like Figure 4.5 because it uses all the data and best presents the fitted model and its assumptions. However, if there is a clinically important difference in the observed range of survival times, then we recommend plotting over each group’s specific times. The best approach in practice is to provide results from both a thorough univariate analysis of survival experience in subgroups of special interest, as well as results from any regression analyses.

Figure 4.5 Graph of the estimated proportional hazards model survival functions for hormone use for the German Breast Cancer Study. Points are plotted at each of 686 observed values of time for both curves.

c04f005

As noted in the previous section, the principal reason for regression modeling is to adjust statistically for possible imbalances in the observed data. As an example of a more complicated model, suppose we fit the proportional hazards model containing hormone therapy and tumor size. If the goal is to present survival functions for the two hormone therapy groups, controlling for tumor size, then we must give some thought to what we mean by controlling for tumor size.

We would like the estimated survival function to use the covariates in the same way that covariates are controlled for in a linear regression. In linear regression, a point on the regression line (or plane for a multiple variable model) is the model-based estimate of the conditional mean of the dependent variable among subjects with values of the covariates defined by the point. The analogy to linear regression can help our thinking, but the situation is a bit more complicated in a proportional hazards regression analysis. Because the model does not contain an intercept, we do not have a fully parametric hazard function and thus the model cannot predict an individual point estimate of the conditional “mean” survival time. The estimated survival function in (4.23) is the proportional hazards model-based estimator of the conditional statistical distribution of survival time. In this case, the word “conditional” means restricting observation to a cohort with covari-ate values equal to values specified. Pursuing this notion further, suppose we were able to follow an extremely large cohort of subjects for 5 years. Furthermore, suppose that the cohort is large enough that we can perform a fully stratified analysis and compute the Kaplan-Meier estimator of the survival function for each possible set of values of the covariates, such as use of hormone therapy and tumor size 25mm. If the proportional hazards model is correct, then the estimator and the Kaplan-Meier estimator should be similar, within statistical variation. We can use the estimator to describe survival time graphically and to compute estimates of quantiles, such as the median, in the same way we used the Kaplan-Meier estimator in Chapter 2.

Estimated survivor functions are most frequently used in applied settings to provide curves, similar to those in Figure 4.5, to compare groups visually, controlling for other model covariates. If the model does not contain grouping variable by covariate interactions, then the resulting survival functions are in a sense “parallel” in a way similar to lines with the same slope in a linear regression. In practice, one would choose one set of “typical” values of the other covariates. For a continuous covariate like tumor size, we usually choose the mean, median or other central value. In the GBCS data, the median tumor size is 25mm and the mean is 29.3. Thus, 25 is a good value to use for tumor size in this example. If we fit the proportional hazards model containing hormone therapy and tumor size, then the baseline survival function estimates the survival experience for no hormone therapy and tumor size equal to 0 mm. To obtain the estimates of the two more realistically size-adjusted survival functions, we have to evaluate the expression in using the coefficients from the fitted model with (hormone = 0, size = 25) and (hormone = 1, size = 25). This approach, while algebraically correct, could cause unwanted round off and computational error in some situations due to exponentiating large positive or negative numbers, though we have rarely encountered this problem in our own work.

We can avoid this problem by centering continuous covariates. In the current example, we fit the model using size_c = size - 25, and the results are shown in Table 4.18. These results are identical to those for the adjusted model in Table 4.11 obtained with size uncentered; the only difference is in the baseline survival function. When we center tumor size, the estimate of the baseline survival function corresponds to no hormone therapy and tumor size 25mm, the zero value for the two covariates in the model. To obtain the second estimated survival function, specific for women receiving hormone therapy and having a tumor size of 25 mm, we compute

c04en032

Table 4.18 Estimated Coefficients, Standard Errors, z-Scores, Two-Tailed p-values, and 95% Confidence Interval Estimates for the Model Containing Hormone Therapy and Tumor Size Centered at 25mm in the German Breast Cancer Study

c04t015

A graph of the two estimated survival functions, plotted at each of the 686 observed values of time, is shown in Figure 4.6. The curves in this graph provide proportional hazards estimates of the survival experience of two cohorts, each with a 25 mm tumor but differing in their hormone therapy use.

The shapes of the curves in Figure 4.6 are determined by the proportional hazards assumption, hormone therapy status, and size equal to 25 mm. If we centered size at a larger number, say 40 mm, then the resulting curves would look similar to those in Figure 4.6 but with poorer survival experience because the rate of recurrence increases with increasing tumor size. The reverse would be true if we centered at 15mm. In all three cases, the two plotted curves would be based on a hazard ratio of 0.69 = exp(–0.373) at all time points.

Figure 4.6 Graph of the tumor size adjusted estimated proportional hazards survival functions for hormone therapy in the German Breast Cancer Study. Functions are plotted at 686 values of time for each group.

c04f006

When the fitted model is even moderately complex, it may be difficult to decide what combination of covariate values best represents the middle of the data. In these situations, plots based on values of the risk score are frequently used in practice. The risk score is the value of the linear portion of the proportional hazards model and its estimator for the ith subject from a model containing p covari-ates is

(4.26)c04e026

Most software packages will calculate and save the values of (4.26). The baseline survival function corresponds to a risk score equal to zero, which may or may not be clinically relevant. We can provide a graphical description of survival experience by plotting the estimated survival function for various quantiles of the risk score, which we can easily get from the descriptive statistics. Typically, one might do this plot for the quartiles of the risk score or other quantiles. The estimator of the survival function for the qth quantile of the risk score,c04ie024, is

(4.27)c04e027

As an example of a typical multivariable survival model, we fit the proportional hazards model to the German Breast Cancer Study data containing hormone therapy, tumor grade with grade one as the referent value, tumor size and the natural logarithm of the number of progesterone receptors3, denoted as ln_prg. The results for the fitted model are given in Table 4.19. As in the previously fit models use of hormone therapy significantly reduces the rate of tumor recurrence as does an increase in the number of progesterone receptors, while rate of tumor recurrence increases significantly for grade two and three tumors and the size of the tumor.

Table 4.19 Estimated Coefficients, Standard Errors, z-Scores, Two-Tailed p-values, and 95% Confidence Interval Estimates for the Model Containing Hormone Therapy, Tumor Grade, Tumor Size, and the Natural Logarithm of the Number of Progesterone Receptors from the German Breast Cancer Study

c04t016

To get a general feeling for time-to-tumor recurrence as a function of model covariates, we plot in Figure 4.7 the estimated survival function for the 10th, 25th, 50lh, 75th, and 90th percentiles of the estimated risk score. These values are –0.487, –0.118, 0.239, 0.593, and 0.899, respectively. The basic “parallelism” seen in Figure 4.7 is a consequence of the proportional hazards assumption and the fact that each estimated survival function is plotted over all 686 values of recurrence time. As expected, as the risk score increases, the survival experience worsens. We can quantify the differences empirically in the estimated curves by reporting their respective estimated median time to cancer recurrence. Because the minimum value of the estimated survival function for the 10lh percentile curve is 0.602, the estimator of the median does not have a finite value. The estimates for the remaining curves are 81, 65, 46, and 32 months, respectively, are obtained by applying (2.11) to each estimated survival function. Methods have not been developed to provide confidence interval estimates for these risk quantile survival curves, to test for their equality, or to provide confidence intervals for their respective median time to response. As a result, the type of presentation shown in Figure 4.7, while providing a useful description of survival experience as a function of risk, is not likely to be helpful for inferential purposes when the study has a key exposure or risk factor.

Figure 4.7 Graph of the estimated proportional hazards survival functions at the 10th, 25th, 50th, 75th, and 90th percentiles of the estimated risk score for fitted model in Table 4.19.

c04f007

The risk score procedure may be modified when we wish to graph the estimated survival functions for any discrete grouping variable, controlling for a risk score based on the remaining covariates. Most often the grouping variable is the key treatment or exposure variable. To accomplish this, we subtract the contribution of the grouping variable to the risk score and calculate the median value that remains. We calculate covariate adjusted survival functions at the median risk with the effect of the grouping variable absent and then with its effect added to the median risk. Suppose the grouping variable is dichotomous and is the first of the p covariates in the model. The modified risk scores, obtained by removing the effect of the grouping variable, are

c04en033

If we denote the median of the modified risk scores as c04ie025, then the estimates of the survival functions for the two groups at this median are

(4.28)c04e028

and

(4.29)c04e029

for each of the i = 1,2,...,nsubjects.

As an example, we use the fitted model in Table 4.19 with hormone therapy as the dichotomous covariate of interest. As noted above, this model has been chosen for demonstration purposes only. The estimated hazard ratio for hormone use is c04ie026 = exp(–.326) = 0.72 with an associated 95 percent confidence interval of (0.564,0.924). The estimate indicates that use of hormone therapy reduces the rate of cancer recurrence by about 28 percent.

In this example, the equation for the estimated risk score from Table 4.19 for the ith subject is

c04en034

and the modified estimated risk score is

c04en035

The median value of the modified risk score is 0.35 and the equations for the estimators of the modified risk score-adjusted survival functions obtained from (4.28) and (4.29) are

(4.30)c04e030

and

(4.31)c04e031

Because the observed range of survival times in the two treatment groups is comparable, we chose to use all 686 observed survival times to plot (4.30) and (4.31). These are shown in Figure 4.8.

The two curves in Figure 4.8 reflect the effect of the use of hormone therapy, adjustment of model covariates via the median modified risk score and the assumption of proportional hazards. We can use the adjusted estimated survival functions, as we did with those shown in Figure 4.7, to estimate the adjusted median time to recurrence. Rounded to whole months, the estimates are 55 months for no hormone therapy and 69 months for users of hormone therapy. Hence, the conclusion, based on this typical model, is that use of hormone therapy is estimated to delay the median time to recurrence by a little over one year.

As noted earlier in this section confidence interval estimators for the risk score covariate-adjusted estimator of the survival function and its median survival time have not been developed. However, if one is able to specify the values of all model covariates then one may use an estimator presented by Andersen, Borgan, Gill and Keiding (1993), equation (7.2.33), page 5061 of the variance of the log of the covariate-adjusted survival function to derive a confidence interval estimator. Their estimator is identical to one presented by Marubini and Valsecchi (1995, in the Appendix to Chapter 6). As noted in Chapter 2, better coverage properties are obtained if a confidence interval for the survival function is based on the log-log transformation of the function. An expression for a variance estimator for this further transformation is also given by Andersen, Borgan, Gill and Keiding (1993) following (7.2.33). We do not consider this method further because the complicated models encountered in practice make it almost impossible to specify a set of covariates that describes the “middle” of the data.

Figure 4.8 Graph of the modified risk score-adjusted estimated survival functions for hormone use based on the fitted model in Table 4.19.

c04f008

The estimated survival function and its variations discussed in this section are effective tools for describing the results of a regression analysis of survival time. We wish to reemphasize the importance of giving careful thought to the plotted range of the curve and estimates of survival probabilities. It is all too easy, with current statistical software, to present graphs and predictions that may inappropriately extrapolate the fitted model.

EXERCISES

1. For this problem, use the WHAS500 data with non-missing values for both gender and bmi. Use length of follow up as the survival time variable, status at last follow up as the censoring variable, and 10 percent as the level of significance.

(a) Fit the proportional hazards model containing gender and estimate the hazard ratio, pointwise and confidence interval. Interpret, in words, the point and interval estimates.

(b) Add bmi to the model fit in 1(a). Is bmi a confounder of the effect of gender? Explain the reasons for your answer.

(c) Estimate the bmi-adjusted hazard ratio, pointwise and confidence interval. Interpret, in words, the point and interval estimates.

(d) Estimate the gender adjusted hazard ratio, pointwise and confidence interval for a 5 kg/m2 increase in bmi. Interpret, in words, the point and interval estimates.

(e) Is there a significant interaction between bmi and gender?

(f) Explain why the main effect coefficients for gender in the models fit in 1(b) and 1 (d) are different.

(g) Using the interaction model fit in 1(e), estimate and interpret the hazard ratio and associated confidence interval for gender, for bmi equal to 15, 20, 25, 30, and 35.

(h) Using the interaction model fit in 1(e), graph the estimate of the hazard ratio for gender with confidence bands as a function of bmi. (i) Using the interaction model fit in 1(e), estimate the hazard ratio, pointvvise, and confidence interval, for a 5 kg/m2 increase in bmi for each gender, (j) Using the interaction model fit in 1(e), compute and then graph the estimated survival functions for males and females with a bmi of 25 kg/m2. Interpret the survival experience presented in this graph.

(k) Using the covariate adjusted survival functions from l(j) estimate the median survival time for males and females with bmi = 25 kg/m2.

2. Repeat problem 1 parts (a)-(c) and (e), using MI order as the risk factor of interest and MI type as the controlling covariate.

(f) Using the interactions model fit in 2(e), estimate the hazard ratio and associated confidence interval for MI type: recurrent, for Q-wave and non Q-wave Mi’s.

(g) Using the interaction model fit in 2(e), compute and then graph the four possible covariate adjusted-survival functions.

3. Using the data from the WHAS500 data with length of follow-up as the survival time variable and status at last follow up as the censoring variable, do the following:

(a) Fit the proportional hazards model containing gender, age (centered at 65 years), the gender by age interaction, Ml type, MI order, the MI type by MI order interaction, heart rate (centered at 85 bpm), and congestive heart failure. Obtain the estimated baseline survival function and the risk score.

(b) Graph the baseline survival function and explain the cohort of subjects whose survival experience is being estimated.

(c) Graph the four risk-score adjusted survival functions at each of the quartiles of the risk score. How does increasing risk score influence time to death following hospital admission for an MI in this cohort?

(d) Using the modified risk score method graph the covariate adjusted survival functions at the two levels of congestive heart failure. Using the estimated survival functions, estimate the median survival time for each group as well as the one-year survival probability. Which of these two measures, median survival time or one-year probability, do you think might be most clinically useful?

1 See Hosmer and Lemeshow (2000) Chapter 3 for a detailed discussion of the interpretation of the coefficients in a logistic regression model.

2 For both plots, we ignore the contribution of the baseline hazard function to the log-hazard function because under the proportional hazards assumption, it does not depend on the covariates.

3 We discuss the rationale for using the natural logarithm of the number of progesterone receptors in Chapter 5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.217.194