Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9

Generalized estimating equations (GEEs) models

Abstract

Chapter 9 is devoted to the descriptions of generalized estimating equations (GEEs). First, I summarize the basic specifications and inferences of the original GEE model, including the construction of the working correlation matrix and the development of the quasi-likelihood information criteria. Some GEE advances are then introduced, consisting of Prentice’s GEE approach, Zhao and Prentice’s GEE method (GEE2), and the GEE models on odds ratios. Next, I compare the conditional and the marginal regression models with the argument that the application of GLMMs is a more suitable perspective than GEEs to predict marginal means in the analysis of non-normal longitudinal data. An empirical illustration is provided to display how to use GEEs in longitudinal data analysis. The analytic results from the illustration provide strong empirical evidence that modeling a complex covariance structure does not necessarily improve the quality of parameter estimates and the goodness-of-fit statistic in GEEs.

Keywords

Generalized estimating equations (GEEs)

marginal model

“naïve” regression model

odds ratio

quasi-likelihood estimating equations

working correlation matrix

In regression modeling on longitudinal data, dependence of repeated measurements for the same subject needs to be taken into account. By specification of the random effects in the regression, a statistical model with mixed effects provides an efficient approach to handling intraindividual correlation. Some statisticians have attempted to analyze correlated data from a different direction, developing a simplified statistical approach referred to as generalized estimating equations, or GEEs. GEEs are designed to estimate the average response given the population-averaged effects of covariates, rather than to generate the parameter estimates that allow predictions of subject-specific trajectories of the outcome. In this approach, dependence of repeated measurements is accounted for by the specification of possibly unknown correlations between observations, referred to as the “sandwich variance” estimator. From a statistical standpoint, GEEs are considered to have a solid theoretical base because the likelihood-based GLMMs are often sensitive to the specification of complex covariance structures.

In this chapter, I describe the basic specifications, inferences, and hypothesis tests of GEEs. Some GEE advances are also introduced. Given the emphasis of this book on empirical applications, an empirical illustration is provided to display the application of this approach in longitudinal data analysis. Finally, the merits and the limitations of GEEs are summarized and discussed.

9.1. Basic specifications and inferences of GEEs

As discussed in Chapter 8, one of the statistical approaches in the analysis of nonlinear longitudinal data is the marginal quasi-likelihood regression model. In this marginal model, the mean response is a parametric function of the covariates, with variance being a function of the marginal mean. The GEEs are an extension of the MQL model. Specifically, the GEE models do not specify the pattern of intraindividual correlation; rather, they construct a robust variance–covariance estimator externally to account for dependence of repeated measurements of the response.

This section is focused on the description of the basic specifications and inferences of the original GEE model developed by Liang and Zeger (1986) and Zeger and Liang (1986). First, an overview of generalized regression modeling is provided given the independence assumption on repeated measurements of the response for the same subject. Next, the classical GEEs are introduced, with the specification of a working correlation matrix. Finally, I delineate the methods for constructing the quasi-likelihood information criteria in GEEs.

9.1.1. Specifications of “naïve” model with independence hypothesis

Prior to the introduction of the classical GEE model, I begin with the specification of a “naïve” model that assumes repeated measurements of the response for the same subject to be conditionally independent given the specified fixed effects θ. Let

$y_{i} = {y_{i 1}, ...., y_{i n_{i}}}'$

be the n_i × 1 vector of the longitudinal data for subject i and

$X_{i} = {X_{i 1}, ...., X_{i n_{i}}}'$

be the n_i × M matrix of covariate values associated with subject i. From Equation (8.16), the marginal density of the outcome for subject i at time point j, denoted by y_ij, can be written as

$f (y_{i j} |θ_{i j}, φ) = \exp [\frac{y_{i j} θ_{i j} - \overset{⌢}{a} (θ_{i j})}{φ} + b (y_{i j}; φ)],$

(9.1)

where, in the longitudinal setting, θ_ij is an element in vector θ_i, and functions

$\overset{⌢}{a} (\cdot)$

, b(·), and the scale factor φ are defined previously. According to Equations (8.4) and (8.5), the first two moments of y_ij are

$E (y_{i j}) = μ_{i} = \frac{\partial \overset{⌢}{a} (θ_{i j})}{\partial θ_{i j}},$

(9.2)

$var (y_{i j}) = \frac{\partial^{2} \overset{⌢}{a} (θ_{i j})}{\partial θ_{i j}^{2} φ} .$

(9.3)

From the relevant description in Chapter 8, the score equation for subject i, denoted by

${\tilde{U}}_{i} (β)$

and mathematically defined as the first partial derivative of the log-likelihood function with respect to b, is written as

${\tilde{U}}_{i} (β) = \frac{\partial}{\partial β} l_{i} (β) .$

The previous expression for the score equation can be expanded by applying the chain rule, given by

${\tilde{U}}_{i} (β) = {(\frac{\partial μ_{i}}{\partial β})}^{'} \frac{\partial θ_{i}}{\partial μ_{i}} \frac{\partial l_{i} (β)}{\partial θ_{i}} .$

Given the properties of the exponential family distributions, described in Chapter 8, the third term on the right of the previous equation, the derivative of the log-likelihood with respect to the canonical parameters θ_i, is

$\frac{\partial l_{i} (β)}{\partial θ_{i}} = y_{i} - E (Y_{i}) = y_{i} - μ_{i},$

and also,

$\frac{\partial μ_{i}}{\partial θ_{i}} = cov (Y_{i}) .$

Given the previous functions, the score equation for subject i can be expanded to be

${\tilde{U}}_{i} (β) = \frac{\partial l_{i} (β)}{\partial β} = {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} (Y_{i}) [y_{i} - μ_{i}] .$

In GLMs,

$θ_{i} = X_{i}^{'} β$

, so that we have

${(\frac{\partial μ_{i}}{\partial β})}^{'} = X_{i}^{'} A_{i},$

where A_i is the n_i × n_i diagonal variance matrix evaluated at X_i, defined by Equation (8.32). Consequently, the maximum likelihood estimate

$\hat{β}$

is the solution to

$\begin{array}{l} \tilde{U} (β) & = \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} (Y_{i}) (y_{i} - μ_{i}) \\ = \sum_{i = 1}^{N} X_{i}^{'} A_{i} (y_{i} - μ_{i}) = 0, \end{array}$

(9.4)

where

$μ_{i} = g^{- 1} (X_{i}^{'} β)$

is specified as the conditional mean for subject i. The previous specifications do not differ significantly from a standard generalized linear model given the assumption that repeated measurements of the response for the same subject are conditionally independent with the specified model parameters of the fixed effects.

9.1.2. Basic specifications of GEEs

In the ordinary maximum likelihood estimator, the parameter estimates

$\hat{β}$

are considered to be consistent and asymptotically normal. In view of the large-sample asymptotic property, the variance of

$\hat{β}$

can be consistently estimated by the inverse of the observed information matrix given the hypothesis of conditional independence. In longitudinal data analysis, such an independence hypothesis can result in the loss of efficiency and robustness in the estimation of regression coefficients, particularly when intraindividual correlation is strong and some of the covariates are time-varying (Fitzmaurice, 1995). More serious, the inverse of the observed information matrix

${\hat{I}}^{- 1} (\hat{β})$

does not provide an adequate variance–covariance matrix for

$\hat{β}$

, thereby indicating an inefficient, biased variance estimator. Therefore, this GLM approach based on the independence hypothesis is referred to as the “naïve” variance estimator in longitudinal data analysis. For correcting the “naïve” approach, Liang and Zeger (1986) and Zeger and Liang (1986) proposed the GEE model.

According to Liang and Zeger (1986) and Zeger and Liang (1986), given the desirable large-sample property, the asymptotic limit of

$\sqrt{N} (\hat{β} - β)$

is 0 for large samples, where β is the true regression coefficient. This proposition is considered valid even in the presence of considerable dependence of longitudinal data. The authors contend, however, that the inverse of the observed information matrix

${\hat{I}}^{- 1} (\hat{β})$

can result in inconsistent estimates of the asymptotic variance of

$\hat{β}$

. Given the independence hypothesis, the variance estimates are less efficient because they are more widely scattered around the true population value than they will be when intraindividual correlation is considered (Diggle et al., 2002; Fitzmaurice, 1995). Therefore, a robust covariance matrix for

$\hat{β}$

needs to be developed to account for covariance in the clustered data. Accordingly, Liang and Zeger (1986) and Zeger and Liang (1986) create a robust variance–covariance estimator externally to take intraindividual correlation into account. Statistically, the classical GEE model does not specify the joint distribution of the subject’s observations; rather, it reduces to the score equations for multivariate normal outcomes (Liang and Zeger, 1986; Zeger and Liang, 1986).

In many ways, the classical GEE model is analogous to the quasi-likelihood estimating equations proposed by Wedderburn (1974), in which one only needs to specify the relationships both between the response outcome mean and the covariates and between the marginal mean and the variance. As Wedderburn’s theory (1974) on quasi-likelihood plays a significant role in the development of the GEE models, a brief introduction of its basic inference is provided in Appendix C.

Let

$V_{i} (μ_{i}, \tilde{a})$

be an n_i × n_i variance–covariance matrix of y_i, defined as

$V_{i} (μ_{i}, b) = φ A_{i}^{1 / 2} R_{i} (\tilde{a}) A_{i}^{1 / 2},$

(9.5)

where A_i is an n_i × n_i diagonal variance matrix containing elements ν, and R(ã) be an n_i × n_i matrix of

$\tilde{α}$

that reflects the pattern of correlation among observations for subject i, as also defined in Chapter 8. In GEEs, R(ã) is referred to as the working correlation matrix to address dependence of repeated measurements. The classical GEE model for the estimation of β starts with the form

${\tilde{U}}_{I} (β) = \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i} {(μ_{i}, \tilde{a})}^{- 1} [y_{i} - μ_{i} (β)] = 0,$

(9.6)

where

${\tilde{U}}_{I} (β)$

is the total score statistic specified for the classical GEE model (GEE1).

Let

${\hat{β}}_{GEE}$

be a vector of the estimated regression coefficients from the classical GEE model. It follows then that

$\sqrt{N} ({\hat{β}}_{GEE} - β)$

is asymptotically multivariate normal with mean 0 and variance–covariance matrix

$V_{GEE} ({\hat{β}}_{GEE}) = I_{0}^{- 1} I_{1} I_{0}^{- 1},$

(9.7)

where

$\begin{array}{l} I_{0} = \sqrt{N} \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i} {(μ_{i}, \tilde{α})}^{- 1} \frac{\partial μ_{i}}{\partial β}, \\ I_{1} = \sqrt{N} \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i} {(μ_{i}, \tilde{α})}^{- 1} cov (y_{i}) V_{i} {(μ_{i}, \tilde{α})}^{- 1} (\frac{\partial μ_{i}}{\partial β}) . \end{array}$

Equation (9.7) specifies the classical GEE, also referred to as the “sandwich” variance estimator, or GEE1, with

$I_{0}^{- 1}$

being the “bread” and I₁ the “meat.” In the equation, I₀ is the conventional, model-based Fisher information matrix, and in the context of longitudinal data analysis, its inverse,

$I_{0}^{- 1}$

, is referred to as the “naïve” variance estimator given its reliance on the “naïve” assumption. The term I₁ is the covariance matrix of the score statistic, and its inclusion in Equation (9.7) accounts for intraindividual correlation in longitudinal data. Therefore, the sandwich variance estimator results in a robust variance–covariance matrix for longitudinal data analysis.

Given the empirical adjustment on dependence of repeated measurements for the same subject, longitudinal data can be adequately assumed to be conditionally independent given the specified fixed effects. As a result, the random vector

$\sqrt{N} (\hat{β} - β)$

is asymptotically normal with mean 0 and a covariance matrix that can be estimated by

${\hat{V}}_{GEE} (\hat{β})$

(Liang et al., 1992). With the specification of a robust variance–covariance matrix, the valid Wald score can be derived to perform hypothesis tests on parameter estimates.

There are some advantages of the GEE approach. First, the asymptotic variance–covariance matrix of

$\hat{β}$

from the “sandwich” approach does not depend on the choice of an estimator for

$\tilde{α}$

and φ (Liang and Zeger, 1986; Zeger and Liang, 1986). Second, when cov(y_i) is well approximated,

$\hat{β}$

is reasonably efficient (Liang et al., 1992). Third, when cov(y_i) is correctly specified, Equation (9.6) is the score equation for b that corresponds to a log linear model (Fitzmaurice and Laird, 1993).

In GEE1,

$\hat{β}$

can be solved by applying the Gauss–Newton method, in which one calculates a regression of residuals on the quantities of the scores with linear least squares (Wedderburn, 1974). Operationally, GEE1 uses an iterative process between a modified Fisher scoring for b and the moment estimation of

$\tilde{α}$

and φ (Liang and Zeger, 1986; Zeger and Liang, 1986). The series of

$\hat{β}$

in the iterative scheme is generally denoted by

${\hat{β}}^{\ddot{j}}$

(

$\ddot{j} = 1, 2, \dots .$

). The iterative scheme terminates when

${\hat{β}}^{\ddot{j} + 1}$

is sufficiently close to

${\hat{β}}^{\ddot{j}}$

. Therefore, the GEE1 iterative procedure for

$\hat{β}$

can be written as

${\hat{β}}^{\ddot{j} + 1} = {\hat{β}}^{\ddot{j}} - {[\sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial {\hat{β}}^{\ddot{j}}})}^{'} V_{i}^{- 1} ({\hat{β}}^{\ddot{j}}) \frac{\partial μ_{i}}{\partial {\hat{β}}^{\ddot{j}}}]}^{-1} \{\sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial {\hat{β}}^{\ddot{j}}})}^{'} V_{i}^{- 1} ({\hat{β}}^{\ddot{j}}) [y_{i} - μ_{i} ({\hat{β}}^{\ddot{j}})]\} .$

(9.8)

Given the iterative process, the maximum likelihood or weighted least squares estimate of b can be operationally defined as

$\hat{β} = {\hat{β}}^{\ddot{j} + 1}$

. Lipsitz et al. (1994) contend that in terms of bias, mean squared error, and power, a one-step estimator, obtained by performing a single step of the GEE model with the use of

${\hat{β}}_{GEE}$

as the starting value, is qualitatively similar to that of the fully iterated estimator.

At a given iteration, the correlation parameters

$\tilde{α}$

and the scale parameter φ can be estimated from the standardized residuals, given by

${\hat{ɛ}}_{i j} = \frac{[y_{i j} - \partial \overset{⌢}{a} ({\hat{θ}}_{i j})]}{\sqrt{\partial^{2} \overset{⌢}{a} ({\hat{θ}}_{i j})}},$

where

${\hat{θ}}_{i j}$

depends on the current value for β. Given the specification of

${\hat{ɛ}}_{i j}$

, the scale φ can be estimated as

${\hat{φ}}^{- 1} = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{n_{i}} {\hat{ɛ}}_{i j}^{2}}{N - M},$

(9.9)

where, in longitudinal data analysis,

$N = \sum n_{i}$

. Given the choice of R(ã), ã can be estimated by a simple function of the equation

${\hat{R}}_{j^{'} {j^{'}}^{'}} = \frac{\sum_{i = 1}^{N} {\hat{ɛ}}_{i j^{'}} {\hat{ɛ}}_{i {j^{'}}^{'}}}{N - M} .$

(9.10)

To summarize, the classical GEE is a marginal regression model, in which the regression coefficients of covariates and the intraindividual correlation parameters are modeled separately (Liang and Zeger, 1986; Zeger and Liang, 1986). The estimation of each of the parameters b,

$\tilde{α}$

, and φ depends on the other two, obtained operationally by an iterative procedure. Practically, the operational process of GEE1 consists of the following steps. First, fit a standard, univariate generalized linear regression model based on the independence hypothesis on repeated measurements of the response for the same subject (the so-called naïve model). Second, use the residuals from the naïve model to estimate the correlation parameters for repeated measurements within the subject. Third, fit the regression model again using a modified algorithm that consists of a correlation matrix estimated from the second step. Finally, iterate between the second and the third steps until the estimates stabilize given a specified statistical criterion.

9.1.3. Specifications of working correlation matrix

To account for dependence of repeated measurements in longitudinal data, Liang and Zeger (1986) and Zeger and Liang (1986) specify an n_i × n_i matrix R_i(ã) of the working correlations across a number of time points to describe the pattern of intraindividual correlation in the response data y_i. The matrix R_i(ã) is generally assumed to be fully specified by the unknown parameter ã, with the structure of ã being determined empirically or by following a relevant theory concerning dependence. For analytic convenience and simplicity, the parameter ã is routinely treated as constant across all subjects.

There is a variety of approaches for the specification of intraindividual correlation structures, as presented extensively in Chapter 5. In the classical GEE model, Fitzmaurice et al. (1993) summarize four common scenarios of the specification for the working correlation matrix R_i(ã), as described.

The first specification is R_i(ã) 5 I, an n_i × n_i identity matrix. The matrix corresponds to the working independence hypothesis assuming no intraindividual dependence in longitudinal data. Therefore, this specification yields identical estimates to those from the naïve regression model using the pooled data without further adjustments. As intraindividual correlation is assumed to be zero, no estimate of ã is required.

The second scenario is to specify the working correlation matrix as

$R_{i} (\tilde{a}) = ρ; j \neq s$

, where j and s are time points. This specification is referred to as the “exchangeable” correlation structure because values of y_i are assumed to vary equally across observations for the same subject. In this specification, ã is a scalar and is estimated by the regression. In continuous longitudinal normal data, this scenario is analogous to the specification of CS in the random intercept linear model.

The third scenario is given by

$R_{i} (\tilde{a}) = ρ^{|j - s|}; j \neq s$

. This correlation pattern is analogous to the AR(1) correlation matrix if time intervals are equally spaced or to the SP(POW) covariance pattern model if time intervals are unequally spaced. In this specification, intraindividual correlation is patterned as an exponential function of the lag length. By an extension of this scenario, higher-order autoregressive structures can also be specified.

Finally, the fourth scenario specifies an unstructured working correlation structure, defined as

$R_{i} (\tilde{a}) = {\tilde{a}}_{s j}; j \neq s$

. In this specification, no constraints are assumed on the pattern of intraindividual correlation. As described in Chapter 5, such a pattern model of correlation can be estimated without restriction on the correlation structure. Correspondingly, ã is specified as an n_i × n_i matrix containing [n_i × (n_i − 1)]/2 unique pair-wise correlation coefficients for all combinations of time points.

There are many other possible correlation structures, and in some special cases, ã can even be specified as depending on subject-specific covariates. For example, one can specify the working correlation matrix as

$g^{- 1} [R_{i} (\tilde{a})] = Z_{i}^{'} \tilde{a}$

, where g(·) is a given link function and Z_i is a subset of subject-specific covariates. As they are rarely used in empirical analyses, those complex working correlation matrices are not further described in this text. For details concerning other working correlation matrices, the interested reader is referred to Fitzmaurice et al. (1993) and Liang and Zeger (1986).

Theoretically, the selection of a working correlation matrix closer to the true correlation structure can yield increased statistical efficiency and consistency (Fitzmaurice, 1995). Liang and Zeger (1986) and Zeger and Liang (1986) prove, however, that one can obtain a consistent and asymptotically Gaussian estimate

$\hat{β}$

or a consistent estimate of V_R for large samples, even if the working correlation matrix R(ã) is incorrectly specified. According to those scientists (1986), in longitudinal data analysis inferences about the regression coefficients are generally robust to misspecification of the model on dependence of observations due to the desirable large-sample properties.

9.1.4. Quasi-likelihood information criteria for GEEs

As no likelihood is specified in statistical inferences, GEEs are basically not a likelihood-based regression. Consequently, the variety of the model fit criteria routinely applied in regression models cannot be directly used to assess the information for model selection in GEEs. For example, the popular Akaike Information Criterion (AIC) is derived from the maximum likelihood estimation with asymptotic properties of MLE, and therefore, its penalty term needs to be modified to fit into the construct of the quasi-likelihood perspective. Accordingly, Pan (2001) develops the quasi-likelihood information criteria to apply for the GEE model fit.

Let each observation μ_ij be some known function of parameters

$β = (β_{1}, ...., β_{M})'$

. It follows then that the quasi-likelihood function for each observation is denoted by

$Q (Y_{i j}, μ_{i j})$

. With the independence working correlation (scenario 1), the quasi-likelihood evaluated with the parameter estimates is defined as

$Q [β (R), φ] = \sum_{i = 1}^{N} \sum_{j = 1}^{n_{i}} Q [\hat{β} (R), φ; (Y_{i j}, X_{i j})] .$

(9.11)

Let

${\hat{V}}_{R}$

be the robust covariance estimate of R and

$\hat{Ω}$

be the model-based covariance estimate under the independence working correlation evaluated at

$\hat{β} (R)$

. The quasi-likelihood information criterion, or QIC(R), is then given by

$QIC (R) = - 2 Q [β (R), φ] + 2 trace ({\hat{Ω}}^{- 1} {\hat{V}}_{R}) .$

(9.12)

Equation (9.12) is the basic quasi-likelihood information criterion for GEEs. The simulating results have shown that ignoring the second term on the right of this equation somewhat affects the performance of the criterion. For large samples, however, removal of this second term usually results in small difference in the QIC(R) value.

When the GEE specifications are correct,

${\hat{V}}_{R}$

and

$\hat{Ω}$

are asymptotically equivalent, thereby leading to the approximation that

$trace ({\hat{Ω}}^{- 1} {\hat{V}}_{R}) \approx \tilde{M}$

, where

$\tilde{M}$

is the number of regression parameters. Given this property, Pan (2001) also develops an approximation to QIC(R), written as

$QI C_{u} (R) = - 2 Q [β (R), φ] + 2 \tilde{M},$

(9.13)

where QIC_u(R) is actually the AIC version given the quasi-likelihood function. In the assessment of the GEE model fit information, Pan (2001) recommends that QIC(R) should be used as the appropriate criterion to select both the regression model and the working correlation structure, and QIC_u(R) as suitable only to select the model itself. Empirically, these two criteria generally yield very close values, thereby resulting in the same conclusion about the model fit, particularly for large samples.

9.2. Other GEE approaches

As summarized, the classical GEE model is a statistical method for providing consistent, asymptotically normal point estimates for the marginal regression parameters. In the GEE estimating process, the specification of the working correlation matrix serves as a nuisance function to adjust for the parameter estimates from the “naïve” model. In most situations, the estimates of the correlation parameters are not of direct concern, and therefore, they are usually not involved in the interpretation of the fixed effects and the model fit statistic. Sometimes, however, the structure of between-clusters association on the response is of direct interest or the researcher wants to know the details of the conditional specifications for a subject at a given time point. In such cases, the classical GEE model needs to be extended for corresponding to the additional theoretical concerns. Correspondingly, some scientists have proposed a variety of GEE extensions based on the classical model.

In this section, several GEE extensions are described and discussed. These extended GEE models include the Prentice’s GEE approach, Zhao and Prentice’s GEE method (GEE2), and GEE models on odds ratios (ORs).

9.2.1. Prentice’s GEE approach

Prentice (1988) expanded the classical GEE model based on the argument that the original GEE1 is not designed to estimate the marginal response probability or the pair-wise correlations. Therefore, a second set of estimating equations should be added to the GEE modeling to specify parameters of the correlation matrix with a binary data structure. Correspondingly, a sample correlation vector for subject i is proposed, denoted by

${\tilde{Z}}_{i} = ({\tilde{Z}}_{i 12}, {\tilde{Z}}_{i 13}, ...., {\tilde{Z}}_{i, n_{i - 1}, n_{i}})'$

, consisting of time occasions (j_1, j₂). A typical element in vector

${\tilde{Z}}_{i}$

is defined as

${\tilde{Z}}_{i j 1. j 2} = \frac{(y_{i j 1} - μ_{i j 1}) (y_{i j 2} - μ_{i j 2})}{\sqrt{μ_{i j 1} (1 - μ_{i j 1})} μ_{i j 2} (1 - μ_{i j 2})} = \frac{(y_{i j 1} - μ_{i j 1}) (y_{i j 2} - μ_{i j 2})}{\sqrt{{\tilde{π}}_{i j 1} {\tilde{q}}_{i j 1} {\tilde{π}}_{i j 2} {\tilde{q}}_{i j 2}}},$

where, in the case of the binary data taking value 1 or 0,

${\tilde{π}}_{i j} = pr (y_{i j} = 1 |X_{i}, β) = E (y_{i j})$

and

${\tilde{q}}_{i j} = 1 - {\tilde{π}}_{i j}$

. By definition,

${\tilde{Z}}_{i j 1. j 2}$

has mean ρ_ij1.j2, which is the pair-wise correlation, and variance

${\tilde{w}}_{i j 1. j 2} = 1 + (1 - 2 {\tilde{π}}_{i j 1}) (1 - 2 {\tilde{π}}_{i j 2}) \sqrt{{\tilde{π}}_{i j 1} {\tilde{q}}_{i j 1} {\tilde{π}}_{i j 2} {\tilde{q}}_{i j 2}} ρ_{i j 1. j 2} - ρ_{j 1. j 2}^{2} .$

Prentice (1988) presents that given the specification of

${\tilde{Z}}_{i}$

, a GEE estimator for b and ã can be defined as a solution to

$\sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} {V_{i}}^{- 1} [y_{i} - p_{i} (β)] = 0,$

(9.14)

$\sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} [{\tilde{Z}}_{i} -_{i} ρ_{i}] = 0,$

(9.15)

where

$ρ_{i} = (ρ_{i 12}, ...., ρ_{i 1 n_{i}}, ρ_{i 23}, ...)'$

${\tilde{E}}_{i} = \partial ρ_{i} / \partial \tilde{α}$

, and

${\tilde{W}}_{i} = diag ({\tilde{w}}_{i 12}, ...., {\tilde{w}}_{i 1 n_{i}}, {\tilde{w}}_{i 23}, ...)$

. In this GEE approach,

$\tilde{W}$

is specified as an [n_i × (n_i − 1)/2]-dimensional square diagonal matrix used as the working correlation structure for

${\tilde{Z}}_{i} = ({\tilde{Z}}_{i 12}, ...., {\tilde{Z}}_{i 1 n_{i}}, {\tilde{Z}}_{i 23}, \dots)'$

. In this expansion, the third- and fourth-order correlations are set to be zero. As a result, the variance–covariance matrix of y_i, denoted by V_i, does not serve as the working correlation matrix in this GEE model. With the specification of the second-order correlations, Prentice’s expansion is also referred to as the higher-order independence working assumption (Molenberghs and Verbeke, 2010).

Given the expansions, the joint asymptotic distribution of

$\sqrt{N} (\hat{β} - β)$

and

$\sqrt{N} (\hat{\tilde{α}} - \tilde{α})$

is multivariate normal with mean 0 and a variance–covariance matrix that can be estimated consistently by

$N \times (\begin{array}{c} A & 0 \\ B & C \end{array}) (\begin{array}{c} Λ_{11} & Λ_{12} \\ Λ_{21} & Λ_{22} \end{array}) (\begin{array}{c} A & B^{'} \\ 0 & C \end{array}),$

(9.16)

where

$A = {[\sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} \frac{\partial μ_{i}}{\partial β}]}^{- 1},$

$\begin{array}{c} B = {(\sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i}^{'})}^{- 1} (\sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} \frac{\partial {\tilde{Z}}_{i}}{\partial β}) \\ \times {[\sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} \frac{\partial μ_{i}}{\partial β}]}^{- 1}, \end{array}$

$C = {(\sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i}^{'})}^{- 1},$

$Λ_{11} = \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} cov (y_{i}) V_{i}^{- 1},$

$Λ_{12} = \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} cov (y_{i}, {\tilde{Z}}_{i}) {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i},$

$Λ_{21} = Λ_{12},$

$Λ_{22} = \sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} cov ({\tilde{Z}}_{i}) {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i} .$

The corresponding covariance matrices, cov(y_i),

$cov (y_{i}, {\tilde{Z}}_{i})$

, and

$cov ({\tilde{Z}}_{i})$

, can be obtained by following the standard GEE procedure, given by

$\begin{array}{l} cov (y_{i}) = (y_{i} - p_{i}) {(y_{i} - p_{i})}^{'}, \\ cov (y_{i}, {\tilde{Z}}_{i}) = (y_{i} - p_{i}) {({\tilde{Z}}_{i} - ρ_{i})}^{'}, \\ cov ({\tilde{Z}}_{i}) = ({\tilde{Z}}_{i} - ρ_{i}) {({\tilde{Z}}_{i} - ρ_{i})}^{'} . \end{array}$

By using the GLM estimates for b and ã as the starting values, a simple iterative procedure can be applied to produce statistically efficient, consistent, and robust results from GEEs. Prentice (1988) indicates that by formalizing the second-moment parameters in GEEs, the model can improve the quality of parameter estimates by drawing simultaneous inferences about the regression and the random components. Liang et al. (1992) present that the solutions to Equations (9.11) and (9.12) yield highly efficient estimates for b and ã. When the cluster size is high, as is often the case in multilevel analytic studies, the solution of Prentice’s approach is computationally difficult (Carey et al., 1993). For example, if n = 50, there will be 1275 × 1275 elements in the covariance matrix of y and

$\tilde{W}$

9.2.2. Zhao and Prentice’s GEE method (GEE2)

Zhao and Prentice (1990) further broaden the GEE modeling by permitting for joint estimation for b and ã. Specifically, by combining b and ã from Section 9.1.2, the parameter b can be estimated by solving the second-order generalized estimating equation, given by

$\tilde{U} (β, \tilde{a}) = {\tilde{D}}_{i}^{'} {\tilde{V}}_{i}^{- 1} {\tilde{f}}_{i} = 0,$

(9.17)

where

${\tilde{D}}_{i}^{'} = (\begin{array}{c} \partial μ_{i} / \partial β & 0 \\ \partial_{i} {\tilde{a}}_{i} / \partial β & \partial ρ_{i} / \partial {\tilde{a}}_{i} \end{array}),$

${\tilde{V}}_{i} = [\begin{array}{c} var (y_{i}) & cov (y_{i}, {\tilde{Z}}_{i}) \\ cov ({\tilde{Z}}_{i}, y_{i}) & var ({\tilde{Z}}_{i}) \end{array}],$

$\tilde{f} = (\begin{array}{c} y_{i} - μ_{i} \\ {\tilde{Z}}_{i} - ρ_{i} \end{array}) .$

The previous GEE specification uses information about intracluster correlations for the improvement of efficiency to estimate b, ã, and the corresponding standard errors. If the first- and second-order models are correctly specified, b and ã can be computed by using a Fisher scoring algorithm. It follows then that the asymptotic process

$\sqrt{N} (\hat{β} - β)$

in this expanded GEE approach is asymptotically multivariate normal with mean 0 and variance–covariance matrix

$\hat{V} (\hat{β}) = {(\sum_{i = 1}^{N} {\hat{{\tilde{D}}^{'}}}_{i} {\hat{\tilde{V}}}_{i}^{- 1} {\hat{\tilde{D}}}_{i})}^{- 1} (\sum_{i = 1}^{N} {\hat{{\tilde{D}}^{'}}}_{i} {\hat{\tilde{V}}}_{i}^{- 1} {\hat{\tilde{f}}}_{i} {\hat{\tilde{f}}}_{i}^{'} {\hat{\tilde{V}}}_{i}^{- 1} \hat{\tilde{D}}) {(\sum_{i = 1}^{N} {\hat{{\tilde{D}}^{'}}}_{i} {\hat{\tilde{V}}}_{i}^{- 1} {\tilde{D}}_{^i})}^{- 1} .$

(9.18)

Given such extensions, Liang et al. (1992) referred to the previous GEE model as GEE2. Provided that the specifications of both the mean and the correlation structure are correct, GEE2 permits more efficient estimation of the parameters as well as allowing the model construction on intracluster (intraindividual in longitudinal data analysis) correlations. Because of the simultaneous estimating process, however, consistent estimates of b in GEE2 rely on the adequate specification of the correlation matrix. As a result, if intracluster correlations are incorrectly specified, neither β nor ã is estimated consistently (Liang et al., 1992).

With the above statistical concerns, caution must be exercised in applying GEE2 for obtaining more efficient estimates of b (Prentice and Zhao, 1991). Without extensive knowledge or strong empirical evidence about the structure of intracluster correlations, the estimates for both b and ã may be biased, thereby possibly resulting in misleading conclusions on the fixed effects. Furthermore, as in the case in Prentice’s approach, when the cluster size is high, the solution of GEE2 is computationally difficult, thereby indicating low applicability of this GEE2 model.

9.2.3. GEE models on odds ratios

Both GEE1 and GEE2 are the moment techniques based on correlation parameters. In the analysis of categorical data, problems might arise in the specification of correlations. For example, the correlation of binary data depends on the means that are constrained in a complex fashion. Some statisticians propose an alternative approach to this conventional perspective by using conditional OR parameterization to model dependence of repeated measurements for the same subject. This approach is considered to be associated with desirable properties and interpretative convenience (Liang et al., 1992; Lipsitz et al., 1991). Statistically, the OR parameterization allows nonzero high-order association parameters to be included in a natural, conventional way, and consequently, the correlation matrix can be expressed in terms of a function of the OR for binary responses at pairs of time points (Fitzmaurice and Laird, 1993).

There is a variety of ways to specify conditional ORs and then to include the OR parameters in the specification of a GEE model. For example, Lipsitz et al. (1991) extend GEE1 to the context of the binary data with ORs. In model specifications, they first define the joint probability of a “success” (Y = 1) at time points j and j′, given by

${\tilde{π}}_{i j j^{'}} = pr (Y_{i j} = 1, Y_{i j^{'}} = 1) = E (Y_{i j j^{'}}),$

where

$Y_{i j j^{'}} = I (Y_{i j} = 1, Y_{i j^{'}} = 1) = Y_{i j} Y_{i j^{'}}$

and I(·) is an indicator function.

Given the previous specification, the authors of this work (1991) display the expression of the joint probability, denoted by

${\tilde{π}}_{i j j^{'}}$

, as a function of

${\tilde{π}}_{i j}$

${\tilde{π}}_{i j^{'}}$

, and the OR between the responses at time points j and j′. By defining a vector of pair-wise marginal ORs for subject i, written by

$Γ_{i} = (γ_{i 12}, γ_{i 13}, ...., γ_{i (J - 1) J})$

, the OR of a given probability pair at time points j and j′, respectively, can be expressed in terms of

${\tilde{π}}_{i j}$

${\tilde{π}}_{i j^{'}}$

, and

${\tilde{π}}_{i j j^{'}}$

, given by

$γ_{i j j^{'}} = \frac{{\tilde{π}}_{i j j^{'}} (1 - {\tilde{π}}_{i j} - {\tilde{π}}_{i j^{'}} + {\tilde{π}}_{i j j^{'}})}{({\tilde{π}}_{i j} - {\tilde{π}}_{i j j^{'}}) ({\tilde{π}}_{i j^{″}} - {\tilde{π}}_{i j j^{'}})}, j \neq j^{'} .$

(9.19)

The specified OR is not constrained by the means, and therefore, in many situations, the OR is considered preferable to express the pair-wise correlations for binary data. The natural logarithm of the OR, denoted by

$\log (γ_{i j j^{'}})$

and referred to as the log OR, is often used as a more convenient measurement of association. The log OR has desirable properties as it can take any value in the range (−∞,∞), with

$\log (γ_{i j j^{'}}) = 0$

indicating no association between Y_ij and

$Y_{i j^{'}}$

. Given the log transformation on the OR, the association between the measurements at two data points can be interpreted in a conventional fashion, while not constrained by the means.

Given the quadratic formula,

${\tilde{π}}_{i j j^{'}}$

can be solved in terms of the OR

$γ_{i j j^{'}}$

and the two marginal probabilities,

${\tilde{π}}_{i j}$

${\tilde{π}}_{i j^{'}}$

, given by

${\tilde{π}}_{i j j^{'}} = \{\begin{array}{l} \frac{ξ_{i j j^{'}} - \sqrt{[ξ_{i j j^{'}}^{2} - 4 γ_{i j j^{'}} (γ_{i j j^{'}} - 1) {\tilde{π}}_{i j} {\tilde{π}}_{i j^{'}}]}}{2 (γ_{i j j^{'}} - 1)} & for (γ_{i j j^{'}} \neq 1), \\ γ_{i j j^{'}} {\tilde{π}}_{i j} {\tilde{π}}_{i j^{'}} & for (γ_{i j j^{'}} = 1), \end{array}$

(9.20)

where

$ξ_{j j^{'}} = [1 - (1 - γ_{i j j^{'}}) ({\tilde{π}}_{i j} + {\tilde{π}}_{i j^{'}})]$

. In Equation (9.20),

${\tilde{π}}_{i j j^{'}} = {\tilde{π}}_{i j j^{'}} (β, \tilde{α})$

is a function of b and

$\tilde{α}$

in the GEE formulation, where

$\tilde{α}$

, in this specific case, is the correlation parameter associated with the ORs.

In this GEE model, the [n_i × (n_i − 1)/2]-dimensional square diagonal matrix

$\tilde{W}$

, used as the working correlation matrix, is specified as

${\tilde{W}}_{i} = diag [var (Y_{i j j^{'}})] = diag [{\tilde{π}}_{i j^{'}} (1 - {\tilde{π}}_{i j^{'}})] .$

Given the specification of

$\tilde{W}$

, the second set of estimating equations in terms of Prentice’s expansion is

$\tilde{U} (\tilde{a}) = \sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} [{\tilde{Z}}_{i} -_{i} ρ_{i} (β, \tilde{a})] = 0 .$

(9.21)

Assuming Y_i to be correctly specified, the joint process

$\sqrt{N} (\hat{β} - β)$

and

$\sqrt{N} (\hat{\tilde{α}} - \tilde{α})$

has an asymptotic distribution that is multivariate normal with mean vector 0 and variance–covariance matrix

${\tilde{V}}_{i} (β, \tilde{a}) = \lim_{N \to \infty} (\begin{array}{c} B_{11}^{- 1} & 0 \\ B_{21} & B_{22}^{- 1} \end{array}) (\begin{array}{c} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{array}) (\begin{array}{c} B_{11}^{- 1} & B_{21}^{'} \\ 0 & B_{22}^{- 1} \end{array}),$

(9.22)

where

$B_{11} = N^{- 1} \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} \frac{\partial μ_{i}}{\partial β},$

$B_{22} = N^{- 1} \sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i}^{'},$

$B_{21} = B_{22}^{- 1} [N^{- 1} \sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} (- \partial ρ_{i} / \partial \tilde{α})] B_{11}^{- 1},$

$Σ_{11} = N^{- 1} \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} [cov (y_{i})] V_{i}^{- 1} (\frac{\partial μ_{i}}{\partial β}),$

$Σ_{22} = N^{- 1} \sum_{i = 1}^{N} {\tilde{E}}_{i}^{'} {\tilde{W}}_{i}^{- 1} [cov ({\tilde{Z}}_{i})] {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i},$

$Σ_{12} = N^{- 1} \sum_{i = 1}^{N} {(\frac{\partial μ_{i}}{\partial β})}^{'} V_{i}^{- 1} [cov (y_{i}, {\tilde{Z}}_{i})] {\tilde{W}}_{i}^{- 1} {\tilde{E}}_{i} .$

As a typical GEE model, the asymptotic variance–covariance matrix of

$\sqrt{N} (\hat{β} - β)$

in this approach is a sandwich estimator, given by

$V (β) = \lim_{N \to \infty} (B_{11}^{- 1} Σ_{11} B_{11}^{- 1}) .$

(9.23)

By replacing b, ã, and other parameters with their estimates, V(b) can be consistently estimated.

Lipsitz et al. (1991) consider it slightly more efficient to estimate β by modeling the pair-wise ORs than by using the pair-wise correlations. They contend that, given the binary responses, the marginal ORs are a natural measurement of association with desirable statistical properties, and log(Γ_i) can be modeled as a linear function of the covariates (Fitzmaurice et al., 1993). Furthermore, given the means and the pair-wise marginal ORs, one can always create R(ã) because the pair-wise correlations are a one-to-one function of the pair-wise marginal ORs.

There are a number of more refined algorithms of GEEs with the specification of conditional pair-wise ORs (Carey et al., 1993; Fitzmaurice and Laird, 1993; Liang et al., 1992). For example, Carey et al. (1993) and Liang et al. (1992) define the OR between the jth and the j′th responses, given by

$\begin{array}{l} γ_{i j j^{'}} & = \frac{pr (Y_{i j} = 1, Y_{i j^{'}} = 1) /pr (Y_{i j} = 1, Y_{i j^{'}} = 0)}{pr (Y_{i j} = 0, Y_{i j^{'}} = 1) / pr (Y_{i j} = 0, Y_{i j^{'}} = 0)} \\ = \frac{pr (Y_{i j} = 1, Y_{i j^{'}} = 1) pr (Y_{i j} = 0, Y_{i j^{'}} = 0)}{pr (Y_{i j} = 1, Y_{i j^{'}} = 0) pr (Y_{i j} = 0, Y_{i j^{'}} = 1)} . \end{array}$

(9.24)

Equation (9.22) can be applied to measure the degree of association between two measurements. For example, if the OR defined in the equation is one, there is no association between Y_ij and

$Y_{i j^{'}}$

. Based on this algorithm, some more complex combinations of the correlated binary data can be modeled by an extension of the two-moment perspective. The reader interested in those GEE algorithms is referred to Carey et al. (1993), Fitzmaurice and Laird (1993), and Liang et al. (1992).

9.3. Relationship between marginal and random-effects models

In Chapter 8, I described the basic specifications, various distributional functions, and a number of estimating methods for nonlinear mixed-effects regression models. These GLMMs are the conditional models as they parameterize the conditional distribution of the nonlinear response given the specified random effects and the observed covariates. Some marginal models are also delineated in that chapter. In Sections 9.1 and 9.2, I further introduce a unique school of marginal regression models – GEEs – which can be applied to analyze both linear and nonlinear response data. In longitudinal data analysis, the conditional and the marginal perspectives often derive different sets of parameter estimates, yielding different interpretations of analytic results.

In this section, I compare the conditional and the marginal regression models, two major statistical perspectives for modeling nonlinear longitudinal data. The applicability of the two approaches in nonlinear predictions is also discussed. Finally, as associated with the comparison between the conditional and the marginal perspectives, I introduce a statistical model that is intended to use the GEE formulation to specify a nonlinear conditional model.

9.3.1. Comparison between the two approaches

While most statistical techniques described in Chapter 8 are the likelihood-based, conditional models, various GEE models are the marginal approaches that do not require the complete specification of the joint distribution on repeated measurements. The underlying rationale of the marginal models is that the mean response is only linked to the covariates, not relying on dependence among observations. Such a marginal approach has tremendous appeal to many researchers of various disciplines who are concerned with the covariates’ effects on the nonnormal response variable and to whom the impact of the random effects is not of direct interest. In contrast, the conditional approach specifies the probability distribution of the response variable as a function of the covariates and a parameter specific for each subject. The application of the conditional models is often based on the assumption that longitudinal data follow some specific stochastic distribution that reflects intraindividual correlation. Without the inclusion of this distribution in regression modeling, the quality of parameter estimates, both point and variance, will be affected by dependence of repeated measurements for the same subject. Therefore, the marginal and the conditional perspectives have their respective focuses and strengths.

In the longitudinal setting, the marginal models are sometimes referred to as population-averaged regression approaches, with GEEs being a typical paradigm in this regard. When the response data are categorical, the marginal models are thought to be useful, with the main focus being on the difference in the transformed normal response between two population subgroups. In this application, population subgroups are identified by different values of covariates. With regard to dependence of longitudinal data, the GEE models account for intraindividual correlation by adjusting the covariance matrix of repeated measurements for the same subject, thereby achieving conditional independence of observations. As a result, the regression coefficients

$\hat{β}$

estimated for GEEs are robust, providing reliable statistical information about the effects of covariates on the transformed normal response variable. Nevertheless, only in some special situations can the estimated regression coefficients from GEEs translate into interpretable results on the response of a nonlinear scale.

Here, consider the binary outcome data. Suppose that two population subgroups are subject to the same distribution of heterogeneity within a common population. It follows then that the inherent random components will cancel out in computation of the ratio of two estimated response means. As a result, the retransformed, multiplicative effect of a covariate indicates the difference in the response variable between the two population subgroups. This interpretation of the GEE estimates, however, does not hold when the structure of intraindividual correlation is not CS. When the structure of within-subject variability takes complex patterns, the effects of covariates on the binary response are time-dependent, and consequently, the linear predictor of GEEs cannot directly translate into a nonlinear prediction due to variations in the random components. Neither can the estimated regression coefficients be directly transformed to interpretable results on the binary response because the population subgroups implicitly do not have the same means of the random components. In Chapter 10, interpretation of the estimated regression coefficients on the binary longitudinal data will be further discussed.

As relative to the population-averaged approach, the conditional models on categorical data are often referred to as subject-specific regression models. The conditional models provide a powerful approach for using fully specified probability functions to fit nonnormal longitudinal data. These models are preferable when the trajectory of nonlinear response outcomes is of primary interest (Zeger et al., 1988). As indicated in Chapter 8, in the analysis of nonlinear longitudinal data, the estimated regression coefficients of covariates are often not directly interpretable, and therefore, nonlinear predictions are required to aid in the interpretation of analytic results. For longitudinal data, a nonlinear prediction must combine the information of the estimated regression coefficients, the values of covariates, and the approximated random effects (Fitzmaurice et al., 2004). As each subject is assumed to have a unique random parameter, the random effect approximates are an integral component in nonlinear predictions. In contrast, the GEE models, where the variance/covariance matrix is specified as a nuisance parameter, cannot be used for nonlinear predictions except that all subjects potentially have exactly the same value of the random effect parameter. Therefore, in longitudinal data analysis, the marginal means of the nonlinear response cannot be derived from a marginal model; rather, they can only be obtained from a conditional model specifying the subject-specific random effects.

Given the different analytic focuses, the interpretation of the regression coefficients differs markedly between the conditional and the marginal perspectives in longitudinal data analysis. For the conditional models, the regression coefficient of a covariate indicates the change in the transformed response variable (e.g., log odds) with a one-unit increase in the covariate within a subject. In the population-averaged approach, the regression coefficient of the covariate represents an average effect on the linear predictor, which cannot directly be retransformed to the population-averaged effect on the untransformed scale. For example, in logistic regression models, the mean of the difference in the log-odds of the response probability is not equal to the difference between two means due to variability in the inherent random effects. These two sets of coefficients are equal if and only if intraindividual correlation is zero. In the general case, where the random effects follow a normal or a multivariate normal distribution, the absolute value of a given regression coefficient in the marginal model, termed

$|β_{M}|$

, is smaller than the corresponding coefficient in the conditional model, termed

$|β_{C}|$

. In the analysis of nonlinear longitudinal data, the coefficient β_C can be retransformed to an interpretable effect on the nonlinear response by a complex retransformation process, whereas its marginal counterpart, β_M, cannot. In the succeeding chapters, the conditional longitudinal models on different data types will be described, in which the issue regarding the interpretation of regression coefficients will be further discussed.

9.3.2. Use of GEEs to fit a conditional model

Zeger et al. (1988) contend that GEEs can be applied to compute the marginal moments, m_i and var(y_i), from the conditional moments and the random effects distribution, denoted by F. Consequently, GLMMs can be expressed in terms of the GEE perspective. By contextually modifying Equations (8.15) and (8.16), the marginal expectation and the variance–covariance matrix for subject i can be written by

$E (y_{i}) = \int g^{- 1} (X_{i}^{'} β + Z_{i}^{'} b_{i}) d F (b_{i}),$

(9.25)

$var (y_{i}) = var [E (y_{i} |b_{i})] + E [var (y_{i} |b)] .$

(9.26)

Let F be the normal distribution with mean 0 and variance–covariance matrix G and the link function be the logit link. The Taylor series expansion about b_i = 0 then gives rise to the approximation

$\begin{array}{l} var (y_{i}) & \approx var [g^{- 1} (X_{i}^{'} β) + \frac{\partial g^{- 1} (X_{i}^{'} β) b_{i}}{\partial b_{i}}] + φ E \{υ [g^{- 1} (X_{i}^{'} β_{i}) + \frac{\partial g^{- 1} (X_{i}^{'} β) b_{i}}{\partial b_{i}}]\} \\ \approx L_{i} Z_{i} G Z_{i}^{'} L_{i} + φ A_{i} (μ_{i}) \\ \approx {\tilde{V}}_{i} (y_{i}), \end{array}$

(9.27)

where, as defined in Chapter 8, ν is a specific variance function, A_i(μ_i) is an n_i × n_i diagonal within-subject variance matrix containing elements ν, and

$L_{i} = diag [\frac{\partial g^{- 1} (X_{i}^{'} β)}{\partial X_{i}^{'} β}] .$

From Equation (8.25), the variance–covariance matrix of the random effects G can be approximated as

$G \approx {(Z_{i}^{'} Z_{i})}^{- 1} Z_{i}^{'} L_{i} [var (y_{i}) - φ A_{i} (μ_{i})] L_{i}^{- 1} Z_{i} {(Z_{i}^{'} Z_{i})}^{- 1} .$

Using the moment estimator, the previous approximation of G can be further expanded:

$\hat{G} = \frac{1}{N} \sum_{i = 1}^{N} {(Z_{i}^{'} Z_{i})}^{- 1} Z_{i}^{'} {\hat{L}}_{i}^{- 1} [(y_{i} - {\hat{μ}}_{i}) (y_{i} - {\hat{μ}}_{i})' - \hat{φ} {\hat{A}}_{i} ({\hat{μ}}_{i})] {\hat{L}}_{i}^{- 1} Z_{i} {(Z_{i}^{'} Z_{i})}^{- 1},$

(9.28)

where

$\hat{φ} = \sum_{i = 1}^{N} \sum_{j = 1}^{n_{i}} \frac{{(y_{i j} - {\hat{μ}}_{i j})}^{2} - {({\hat{L}}_{i})}^{2} Z_{i j}^{'} \hat{G} Z_{i j}}{υ ({\hat{μ}}_{i j})} .$

(9.29)

In Equation (9.28), the information about b is not directly specified. According to Zeger et al. (1988), the GEE iterative procedure can be performed to compute

$\hat{β}$

, G, and φ simultaneously. Consequently, the parameters in GLMMs, including the between-subjects random components, can be estimated from the specification of a marginal model without completely specifying the joint distribution of the multivariate response. Therefore, the application of this approach is based on the proposition that the score statistic specified in the marginal model accounts for the entire magnitude and dimension of the random components, including both the between-subjects random effects and within-subject random errors.

The proposition underlying the approach given earlier, however, cannot hold for modeling nonlinear longitudinal data. Between-subjects variability is generally unobservable, usually approximated by the use of the empirical Bayes or some other complex simulating procedures, and therefore, the score function in the marginal model only accounts for a limited portion of the entire variability in the response data. As will be discussed in the succeeding chapters, in the presence of the subject-specific random effects, marginal means can only be predicted from a corresponding conditional model, not from a marginal model.

9.4. Empirical illustration: effect of marital status on disability severity in older Americans

In this empirical illustration, I aim to analyze the effect of an explanatory variable on a binary outcome variable by applying the GEE methodology. Given the requirement of a large sample size to execute GEEs, the AHEAD longitudinal data are used for the analysis (six waves: 1998, 2000, 2002, 2004, 2006, and 2008). The study topic is the effect of current marriage on the presence of disability. Given this research focus, the binary outcome variable is the disability score, with 1 = functionally disabled and 0 = not functionally disabled. This dichotomous variable is created from the ADL count previously indicated. Specifically, a subject is defined as functionally disabled if he or she reports any degree of health-related difficulty in performing activities of daily living, given by ADL_COUNT > 0. As previously mentioned, the ADL count consists of five task items (dress, bath/shower, eat, walk across time, and get in/out of bed). The ADL binary score is measured at the six time points and named ADL_BIN in the analysis. The probability of disability is denoted by Pr(Y_ij = 1), where i and j indicate subject i at time point j. There are more appropriate approaches for measuring disability in the literature of aging and health studies; the measurement issue of this health indicator, however, is currently not of concern given the focus on an illustration of a statistical technique. The time factor is specified as a continuous variable, for which only the linear component is considered given the results of a preliminary data analysis. The main explanatory variable, marital status, measured at six waves and named Married in the analysis, is the dichotomous variable previously specified, with 1 = currently married and 0 = else. As specified in the linear mixed model on the ADL count, an interaction between time and Married is created. The three centered covariates, “Age_mean,” “Educ_mean,” and “Female_mean,” continue to be used as the control variables.

In the application of GEEs for this analysis, the logit link is specified to indicate the association between the covariates and the binary outcome. The logit model in this context is written as

$logit (μ_{i j}) = \log \frac{\Pr (Y_{i j} = 1)}{\Pr (Y_{i j} = 0)} = X_{i j}^{'} β,$

where μ_ij is the marginal expectation of the functional disability score for subject i at time point j, and the covariate vector X_ij contains six covariates: TIME, Married, TIME × Married, Age_mean, Educ_mean, and Female_mean. In the equation, the log odds is the transformed outcome variable, assumed to be linearly associated with the covariates. The logit regression model will be further described in Chapter 10, both generally and with specific regard to longitudinal data analysis.

Here, the issue of centering a dichotomous variable in the application of nonlinear regression models needs to be discussed. In the present illustration, the centered variable of FEMALE is viewed as the expected proportion of women or the propensity score to be a female in the population of interest. In the present regression analysis, it is used to adjust for the effect of a potential confounder on the binary response in a hypothetical population. In predicting the probability for an actual stratum, using a centered dichotomous variable can result in some bias due to functional transformation and retransformation in nonlinear predictions. In such situations, nonlinear predictions should be performed for each stratum separately and then the weighted average of the predicted probabilities should be computed for each observed covariate profile (Muller and MacLehose, 2014).

As indicated earlier, the GEE estimating function is the score solution for β when the data follow a log-linear probability distribution. Correlations among subject-specific observations are accounted for by modeling the working correlation matrix. The SAS PROC GENMOD procedure (SAS, 2012) is applied to estimate population-averaged parameters in nonlinear marginal models given its tremendous flexibility. First, a classical GEE model is created, which uses working correlations to model dependence of repeated measurements of the response for the same subject. The SAS program for this model is displayed.

SAS Program 9.1:

......

In SAS Program 9.1, two time variables, TIME and TIME1, are specified due to the use of the time factor as both the categorical and the continuous variables in this GEE model. TIME1, a copy of TIME, is included in the model as the continuous. The binary, time dependent disability score, ADL_BIN, is constructed from the original set of the binary variables, ADL_98 through ADL_08. In the PROC GENMOD statement, the option DESCEND tells SAS to model the probability that Y_ij = 1. If this option is not specified, the PROC GENMOD procedure models the probability that Y_ij = 0 as default. In the CLASS statement, the subject’s ID (HHIDPN), MARRIED, and TIME are specified as classification variables, and the PARAM = REF option requests SAS to apply reference cell coding. The specification of a single time factor as two data types is also applied in some other situations. For example, in a linear regression model specifying a unique hybrid covariance matrix, the REPEATED and the RANDOM statements can be specified in the same SAS program. In such cases, two time variables, with exactly the same face values, must be defined simultaneously: one serving as a classification factor included in the CLASS and the REPEATED statements and one used as a continuous variable incorporated in the RANDOM statement.

In the MODEL statement, ADL_BIN is specified as the dependent and the six covariates as the independent variables. Not included in the CLASS statement, TIME1 is specified as a continuous variable in this GEE model. The DIST = BIN option specifies a binomial distribution for the variable ADL_BIN. In the REPEATED statement, the CORR = UNSTR option specifies the covariance structure of the multivariate responses to be unstructured, and the option CORRW requests SAS to display the estimated working correlation matrix. The WITHIN = TIME option specifies the order of repeated measurements within subjects ensuring the observations to be properly ordered and arranged for computing the working correlation matrix. In this step, the time factor needs to be specified as a classification factor, thereby resulting in the specification of two time variables with exactly the face values in the GEE model: one continuous and one categorical.

The analytic results derived from SAS Program 9.1 are displayed in the following output tables.

SAS Program Output 9.1:

In SAS Program Output 9.1, the working correlation matrix, given unstructured covariance, is displayed first. There are two distinctive features of this GEE correlation matrix. First, the correlation coefficients in the table suggest that intraindividual repeated measurements are very highly correlated. Therefore, without accounting for correlations among observations for the same subject, the parameter estimates of the regression can be biased and inconsistent. Second, the intraindividual correlation tends to decay consistently over time lag, and the value of correlation is negatively associated with the absolute distance between two time points. As indicated in Chapter 5, such an autoregressive pattern of correlations in repeated measurements is frequently observed, resulting in the widespread application of the AR(1) or the spatial covariance pattern models in longitudinal data analysis.

Next, the model goodness-of-fit information is reported, which is derived from the two GEE fit criteria described in Section 9.1.4, QIC(R) and QIC_u(R). The values of the two model fit statistics are very close, thereby leading to the same conclusion about the goodness-of-fit for this GEE model.

Finally, the parameter estimates, the corresponding standard errors, confidence intervals, z-scores, and p-values are displayed. Time is shown to have a positive, statistically significant effect on disability, as expected. With the currently not married treated as the reference group, current marriage among older persons is negatively associated with the probability of disability, other variables being equal, with its regression coefficient being −0.36 with p < 0.0001. This effect on the logit scale translates into an OR of 0.60, suggesting a 0.40 points reduction in the individual odds of being disabled for a currently married older person as compared to his or her currently not married counterpart. The interaction between TIME and MARRIED is not statistically significant (β_{time × married} = 0.03, p = 0.45), and therefore, the effect of current marriage on disability is consistent throughout the entire period of time. The effects of the three control variables, all statistically significant (p < 0.0001), are given as: β_age = 0.10, OR = 1.11; β_edu = −0.08, OR = 0.92; and β_female = 0.27, OR = 1.31. Interestingly, the effects of the three controls on the logit of ADL_BIN are very close to those on ADL_COUNT.

As indicated earlier, in the classical GEE models, the working correlation matrix of binary data is considered to depend on the means of a categorical data type, thus being constrained in a complicated way. It is perceived that by modeling dependence of repeated measurements with use of conditional OR parameterization, the constraints on the means are relaxed, thereby potentially yielding statistically more efficient, consistent parameter estimates. To examine the validity of this argument, I illustrate a GEE model specifying the log OR for each pair of responses within the same subject for analyzing the association between current marriage and disability in older Americans. For analytic convenience, all subjects are parameterized identically. SAS Program 9.2 specifies a fully parameterized log OR model.

SAS Program 9.2:

In SAS Program 9.2, all specifications are the same as those in SAS Program 9.1 except for the inclusion of the LOGOR = FULLCLUST option in the REPEATED statement. This option specifies fully parameterized clusters or, in this context, subjects. There is a parameter for each pair of observations, and therefore, there are n × (n − 1)/2 parameters in the vector

$\tilde{α}$

. In the present analysis, therefore, there are altogether fifteen parameters for the log OR pairs given six time points.

SAS Program 9.2 yields a complete set of results about the estimation. The following output is the parameter information for the GEE model.

SAS Program Output 9.2a:

In SAS Program Output 2a, the fifteen elements of

$\tilde{α}$

correspond to the log OR pairs, with the first number in the parentheses being the order of the row and the second the order of the column. For example, “Alpha5” indicates the log OR between the first and the sixth time points.

Next, the model goodness-of-fit information for the present log OR model is displayed.

SAS Program Output 9.2b:

This GEE model with log OR parameters yields two model goodness-of-fit information criteria, QIC(R) and QIC_u(R). Again, the values of the two model fit statistics are very close, pointing to the same conclusion about goodness-of-fit for the analysis. Surprisingly, both the QIC(R) and QIC_u(R) scores in SAS Program Output 9.2b are moderately higher than those reported in SAS Program Outcome 9.1, thereby suggesting that the GEE model with log OR parameters fits the AHEAD data slightly worse. The following output table displays the analytic results.

SAS Program Output 9.2c:

The output table displays the parameter estimates, the corresponding standard errors, confidence intervals, z-scores, and p-values. The results for the intercept and the regression coefficients of the four covariates are very close to those estimated from the unstructured working correlation model, with only minor variations. Clearly, for the present analysis, using either the classical or the log OR GEE model generates the same conclusions about the effect of current marriage on disability. Furthermore, the log OR parameters, all statistically significant, suggest that repeated measurements for the same subjects are very highly correlated given the substantial deviations of all the OR pairs from unity. Analogous to the pattern from the classical GEE model, such intraindividual correlations tend to decay consistently over time lag. Clearly, the changing pattern of log OR pairs also results in the same conclusion from the application of the classical GEE model.

Given the similarity of analytic results between the two GEE models, some theoretical questions may be advanced promptly. If the estimated regression coefficients and the goodness-of-fit indices are insensitive to the specification of a more refined covariance matrix, is it necessary to attach a working correlation or an OR matrix in the application of a generalized linear model? According to large-sample theory, the asymptotic limit of

$(\hat{β} - β_{0})$

, where b₀ is the true coefficient vector, tends to be 0 as the sample size increases. Therefore, the point estimates of the regression coefficients in the standard GLMs are asymptotically unbiased, even with the presence of dependence in repeated measurements of the response for the same subject. Given correlated data, however, the inverse of the observed Fisher information matrix given

$\hat{β}$

does not provide an adequate variance estimator of

$\hat{β}$

(Liang and Zeger, 1986; Zeger and Liang, 1986).

With these concerns, next I create a simple logistic regression model on the relationship between marital status and disability. By doing so, the estimated regression coefficients and the corresponding standard errors from the “naïve” logistic regression model can be compared to those from the two GEE models. The following is the SAS program for this logistic model.

SAS Program 9.3:

In SAS Program 9.3, the PROC LOGISTIC statement calls for the application of the logistic regression procedure. Other statements and options have the same interpretations as those in the PROC GENMOD procedure. Without any repeated or random components, this logistic regression model yields the regression coefficient estimates without adjusting dependence of repeated measurements for the same subject. The following output presents the MLE solution for the regression coefficients and other statistics.

SAS Program Output 9.3:

SAS Program Output 9.3 displays the estimated regression coefficients, standard errors, and p-values for the intercept and the six covariates. Compared to the results from the two GEE models, there are some changes in the point estimates, particularly the estimated regression coefficients of TIME1 and TIME1 × MARRIED. Surprisingly, the estimated standard errors from the “naïve” model are fairly close to those of the GEEs, more so than between the two sets of the regression coefficients. Therefore, the p-values of the regression coefficient estimates result in identical test results to those from the GEEs.

9.5. Summary

In longitudinal data analysis, if a researcher’s main interest resides in the marginal mean parameters, it is contended that a quasi-likelihood method can be applied without the necessity of performing the full likelihood procedure. Wedderburn (1974) proves that in various exponential families of probability distributions, the quasi-likelihood estimating equations generate consistent estimates of the regression parameters

$\hat{β}$

in any generalized linear model. Based on the quasi-likelihood approach, the classical GEEs, proposed by Liang and Zeger (1986) and Zeger and Liang (1986), yield asymptotically unbiased estimates of the regression coefficients. In longitudinal data analysis, such marginal estimates are considered robust to misspecification of dependence among repeated measurements for the same subject. Since the advent of the classical GEE model, a variety of GEE extensions have been advanced (Prentice, 1988; Zhao and Prentice, 1990). These GEE extensions allow for joint estimation of

$(β, \tilde{α})$

but require the correct model specification for both the mean and the pair-wise correlations to yield consistent estimates of b. More recently, Liang et al. (1992) and Lipsitz et al. (1991) model the dependence in longitudinal data with conditional OR parameterization given some desirable statistical properties and interpretative convenience. Some scientists (McDonald, 1993) contend that given the desirable large-sample property, parameter estimates from various GEE models do not seem to change much as compared to those from the estimating equation assuming independence. The empirical illustration displayed in this chapter supports the argument that modeling a complex covariance structure does not necessarily improve the quality of parameter estimates and the goodness-of-fit statistic.

As focused on the marginal associations between covariates and the clustered responses, the classical GEE approach specifies intraindividual correlation as a nuisance parameter by the creation of a so-called “working correlation” covariance structure. The original GEE is robust against misspecification of the covariance structure, and therefore, the assumption on the distribution of the response data is not required in the application of this method. This modeling approach, however, does not provide sufficient information for computing nonlinear predictions of the response outcomes in longitudinal data analysis. When the link function in the response is nonlinear, the marginal mean from the equation

$g^{- 1} (X_{i}^{'} \hat{β})$

does not yield the population-averaged predictions because retransformation of the random components is not taken into account. Such retransformation bias in predicting the probability of the categorical outcome variable has been discussed in Chapter 8 and will be further emphasized in the succeeding chapters. Indeed, in the analysis of nonlinear longitudinal data, the application of GLMMs is a more suitable perspective than GEEs for computing marginal means of the nonnormal response outcomes.

Furthermore, GEEs are associated with a lack of efficiency due to incomplete, occasionally incorrect model specifications when the sample size is small or the regression model includes time-varying covariates (Fitzmaurice, 1995; Fitzmaurice et al., 1993; Lipsitz et al., 1994). Based on the assumption of missing completely at random, the GEE models cannot be applied efficiently if missing data mechanisms are complex (Fitzmaurice, 1995; Fitzmaurice et al., 1993; Liang and Zeger, 1986). From the statistical standpoint, the conditional GLMMs provide a more flexible, statistically powerful perspective for modeling nonlinear longitudinal data, including the estimation of the fixed effects, the approximation of the random parameters, and the prediction of marginal means for a population of interest. Given the limitations in the approach, GEEs have gradually become a much less applied methodology than GLMMs in the analysis of nonnormal longitudinal data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Generalized estimating equations (GEEs) models

Create new playlist

Sign In

Sign Up

Abstract

Keywords

9.1. Basic specifications and inferences of GEEs

9.1.1. Specifications of “naïve” model with independence hypothesis

9.1.2. Basic specifications of GEEs

9.1.3. Specifications of working correlation matrix

9.1.4. Quasi-likelihood information criteria for GEEs

9.2. Other GEE approaches

9.2.1. Prentice’s GEE approach

9.2.2. Zhao and Prentice’s GEE method (GEE2)

9.2.3. GEE models on odds ratios

9.3. Relationship between marginal and random-effects models

9.3.1. Comparison between the two approaches

9.3.2. Use of GEEs to fit a conditional model

9.4. Empirical illustration: effect of marital status on disability severity in older Americans

9.5. Summary

Table of Contents for
Chapter 9: Generalized estimating equations (GEEs) models