Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8

Cluster Correlated Data

8.1 Introduction

Often in practice, data are collected in clusters. Examples include block designs, repeated measure designs, and designs with random effects. Generally, the observations within a cluster are dependent. Thus the independence assumption of fixed effects linear models breaks down. These models generally include fixed effects, also. Inference (estimation, confidence intervals, and tests of linear hypotheses) for the fixed effects is often of primary importance.

Several rank-based approaches have been considered for analyzing cluster-correlated data. Kloke, McKean, and Rashid (2009) extended the rank-based analysis for linear models discussed in Chapters 4–5 to many cluster models which occur in practice. In their work, the authors, in addition to developing general theory for cluster-correlated data, develop the application of a simple mixed model with one random effect and an arbitrary number of fixed effects and covariates. Kloke and McKean (2011) discuss a rank-based based analysis when the blocks have a compound symmetric variance covariance structure.

In this chapter we illustrate extensions of the rank-based methods discussed in earlier chapters to data which have cluster-correlated responses. For our purpose we consider an experiment done over a number of blocks (clusters) where the observations within a block are correlated. We begin (Section 8.2) by discussing Friedman’s nonparametric test for a randomized block design. In Section 8.3, we present the rank-based analysis of Kloke, McKean, and Rashid (2009). Besides tests of general linear hypotheses, this analysis includes estimation with standard errors of fixed effects as well as diagnostic procedures to check the quality of fit. Section 8.4 offers a discussion of robust estimation of variance components. These estimates are also used in the estimation of standard errors and in the Studentized residuals. We end the chapter with a discussion of rank-based procedures for general estimation equation (GEE) models which in terms of assumptions are the most general. Computation by R and R packages of these analyses is highlighted throughout the chapter.

For this chapter we use a common notation which we provide now. Suppose we have m blocks or clusters. Within the kth cluster there are nk measurements. We may model the ith measurement within the kth cluster as

$Y_{k i} = α + x_{k i}^{T} β + e_{k i} for k = 1, ... m, i = 1, ..., n_{k}, (8.1)$

where xki is a vector of covariates. The errors between clusters are assumed to be independent and the errors within a block are assumed to be correlated.

At times, dependent data fit into a multivariate frame-work; i.e., a multivariate multiple regression model. In this edition, we have not covered rank-based procedures for multivariate analysis. We refer the reader to Chapter 6 of Hettmansperger and McKean (2011) and Oja (2010) for discussions of these procedures.

8.2 Friedman’s Test

The first nonparametric test for cluster-correlated data was developed by Friedman (1937). The goal is to compare the effect of n treatments. Each treatment is applied to each of m experimental units or clusters. In this test a separate ranking is calculated for each of the clusters. The rankings are then averaged for each of the treatments and then compared. If there is a large difference between the average rankings the null hypothesis of no treatment effect is rejected.

Suppose we have n treatments and m clusters each of size n. Suppose all the treatments are randomly assigned once within a cluster. Let Ykj denote the measurement (response) for the jth treatment within cluster (experimental unit) k. Assume the model is

$\begin{array}{l} Y_{k j} = α + β_{j} + b_{k} + ϵ_{k j}, & k = 1, ..., m, j = 1, ..., n, \end{array} (8.2)$

where α is an intercept parameter, βj is the jth treatment effect, bk is the random effect due to cluster k, and ϵkj is the jkth random error. Assume that the random errors are iid and are independent of the random effects.

Let Rkj denote the rank of Ykj among Ykl,..., Ykn. Let

${\bar{R}}_{\cdot j} = \frac{\sum_{k = 1}^{m} R_{k j}}{m} .$

The test statistic is given by

$T = \frac{12 m}{n (n + 1)} \sum_{j = 1}^{n} {(R_{· j} - \frac{n + 1}{2})}^{2} .$

Under H0, the test statistic T has an asymptotic $χ_{n - 1}^{2}$ distribution. We illustrate the R computation of Friedman’s test with the following example.

Example 8.2.1 (Rounding First Base).

This example is discussed in Hollander and Wolfe (1999). In the game of baseball, three methods were evaluated for rounding first base (for an illustration see Figure 7.1 of Hollander and Wolfe 1999). Label these methods as round out, narrow angle, and wide angle. Each method was evaluated twice for each of m = 22 baseball players. The average time of the two runs are in the dataset firstbase. Hence, there are 22 blocks (clusters) and one fixed effect (method of base rounding) at three levels. The R function friedman.test can take either a numeric data matrix, separate arguments for the response vector, the group vector, and the block vector or a formula.

Figure 8.1

Figure showing plot of fitted values vs. studentized residuals for crabgrass data.

Plot of fitted values vs. Studentized residuals for crabgrass data.

> friedman.test(as.matrix(firstbase))

   Friedman rank sum test

data: as.matrix(firstbase)

Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805

Hence, the difference between methods of rounding first base is significant. Friedman’s test is for an overall difference in the methods. Note that it offers no estimate of the effect size between the different methods. Using the rank-based analysis discussed in the next section, we can both test the overall hypothesis and estimate the effect sizes, with standard errors.

8.3 Joint Rankings Estimator

Kloke et al. (2009) showed that rank-based analysis can be extended to cluster-correlated data. In this section we summarize these methods and present examples which illustrate the computation; as we demonstrate the function to determine the fit is jrfit and there are, in addition, several of the standard linear model helper functions.

Assume an experiment is done over m blocks or clusters. Note that we use the terms block and cluster interchangeably. Let nk denote the number of measurements taken within the kth block. Let Yki denote the response variable for the ith experimental unit within the kth block; let xki denote the corresponding vector of covariates. Note that the design is general in that xki may contain, for example, covariates, baseline values, or treatment indicators.

The response variable is then modeled as

$Y_{k i} = α + x_{k i}^{T} β + e_{k i} for k = 1, ..., m, i = 1, ..., n_{k}, (8.3)$

where α is the intercept parameter, β is a p × 1 vector of unknown parameters, and eki is an error term. We assume that the errors within a block are correlated (i.e. eki & eki′) but the errors between blocks are independent (i.e. eki & ek′j). Further, we assume that eki has pdf and cdf f(x) and F(x), respectively. Now write model (8.3) in block vector notation as

$Y_{k} = α 1_{n_{k}} + X_{k} β + e_{k} . (8.4)$

where $1_{n_{k}}$ is an nk × 1 vector of ones and $X_{k} = {[x_{k 1} ... x_{k n_{k}}]}^{T}$ is a nk × p design matrix and $e_{k} = {[e_{k 1}, ... e_{k n_{k}}]}^{T}$ is a $n_{k} \times 1$ vector of error terms. Let $N = \sum_{k = 1}^{m} n_{k}$ denote the total sample size. Let $Y = {(Y_{1}^{T}, ..., Y_{m}^{T})}^{T}$ be the $N \times 1$ vector of all measurements (responses) and consider the matrix formulation of the model as

$Y = α 1_{N} + X β + e (8.5)$

where 1N is an N × 1 vector of ones and $X = {[X_{1}^{T} ... X_{m}^{T}]}^{T}$ is a N × p design matrix and $e = {[e_{1}^{T}, ... e_{m}^{T}]}^{T}$ is a N × 1 vector of error terms. Since there is an intercept in the model, we may assume (WLOG) that X is centered.

Select a set of rank scores $a (i) = φ [i / (N + 1)]$ for a nondecreasing score function φ which is standardized as usual, ( $\int φ (u) d u = 0$ and $\int φ^{2} (u) d u = 1$ ). As with Rfit, the default score function for jrfit is the Wilcoxon, i.e., $φ (u) = \sqrt{12} [u - (1 / 2)]$ . Then the rank-based estimator of β is given by

${\hat{β}}_{φ} = Argmin || y - X β {||}_{φ} where || v {||}_{φ} = \sum_{t = 1}^{N} \begin{array}{l} a (R (v_{t})) v_{t}, & v \in R^{N}, \end{array} (8.6)$

is Jaeckel’s dispersion function.

For formal inference, Kloke et al. (2009) develop the asymptotic distribution of the ${\hat{β}}_{φ}$ under the assumption that the marginal distribution functions of the random vector ek are the same. This includes two commonly assumed error structures: exchangeable within-block errors as well as the components of ek following a stationary time series, such as autoregressive of general order.

This asymptotic distribution of $\hat{β}$ is given by

${\hat{β}}_{φ} \dot{~} N_{p} (β, τ_{φ}^{2} {(X^{T} X)}^{- 1} (\sum_{k = 1}^{m} X_{k}^{T} Σ_{φ_{k}} X_{k}) {(X^{T} X)}^{- 1})$

where $Σ_{k} = var (φ (F (e_{k})))$ and $F (e_{k}) = {[F (e_{k 1}), ..., F (e_{k n_{k}})]}^{T}$ . To estimate τφ, jrfit uses the estimator purposed by Koul et al. (1987).

8.3.1 Estimates of Standard Error

In this section we discuss several approaches to estimating the standard error of the R estimator defined in (8.6). Kloke et al. (2009) develop the inference under the assumption of exchangeable within-block errors; Kloke and McKean (2013) considered two additional estimates and examined the small sample properties of each.

Let $V = (\sum_{k = 1}^{m} X_{k}^{T} Σ_{φ_{k}} X_{k})$ . Let σij be the (i, j)th element of Σφk. That is $σ_{i j} = cov (φ (F (e_{1 i})), φ (F (e_{1 j})))$ .

Compound Symmetric

Kloke et al. (2009) discuss estimates of Σφk when the within block errors are exchangeable. Under the assumption of exchangeable errors Σφk reduces to compound symmetric; i.e., Σφk = [σij] where

$σ_{i j} = {\begin{array}{l} 1 if i = j \\ ρ_{φ} if i \neq j \end{array}$

and $ρ_{φ} = cov (φ (F (e_{11})), φ (F (e_{12})))$ . An estimate of ρφ is

${\hat{ρ}}_{φ} = \frac{1}{M - p} \sum_{k = 1}^{m} \sum_{i > j} a (R ({\hat{e}}_{k i})) a (R ({\hat{e}}_{k j}))$

where $M = \sum_{k = 1}^{m} (_{2}^{n_{k}})$ .

One advantage of this estimate is that it requires estimation of only one additional parameter. A main disadvantage is that it requires the somewhat strong assumption of exchangeability.

Empirical

A natural estimate of Σφ is the unstructured variance-covariance matrix using the sample correlations. To simplify notation, let $a_{k i} = a (R ({\hat{e}}_{k i}))$ . Estimate σij with

${\hat{σ}}_{i j} = \sum_{k = 1}^{m} (a_{k i} - {\bar{a}}_{\cdot i}) (a_{k j} - {\bar{a}}_{\cdot j})$

where $\bar{a} ._{i} = \sum_{k = 1}^{m} a_{k i}$ .

The advantage of this estimator is that it is general and makes no additional simplifying assumptions. In simulation studies, Kloke and McKean (2013) demonstrate that the sandwich estimator discussed next works at least as well for large samples as this empirical estimate.

Sandwich Estimator

Another natural estimator of V is the sandwich estimator, which for the problem at hand is defined as

$\frac{m}{m - p} \sum_{k = 1}^{m} X_{k}^{T} a (R ({\hat{e}}_{k})) a {(R ({\hat{e}}_{k}))}^{T} X_{k} .$

Kloke and McKean (2013) demonstrate that this estimate works well for large samples and should be used when possible. The advantage of this estimator is that it does not require additional assumptions. For very small sample sizes, though, it may lead to biased, often conservative, inference. Simulation studies, however, suggest that when m ≥ 50 the level is close to α. See Kloke and McKean (2013) for more details. The sandwich estimator is the default in jrfit.

8.3.2 Inference

Simulation studies suggest using t a distribution for tests of hypothesis of the form

$H_{0} : β_{j} = 0 versus H_{A} : β_{j} \neq 0.$

Specifically, when the standard error (SE) is based on the estimate of the compound symmetric structure, we may reject the null hypothesis at level α provided

$| \frac{{\hat{β}}_{j}}{SE ({\hat{β}}_{j})} | > t_{α,}_{N - p - 1 - 1} .$

On the other hand, if the sandwich estimator is used, (Section 8.3.1), we test the hypothesis using df = m. That is, we reject the null hypothesis at level α if

$| \frac{{\hat{β}}_{j}}{SE ({\hat{β}}_{j})} | > t_{α, m} .$

These inferences are the default when utilizing the summary functions of jrfit. We illustrate this discussion with the following examples.

8.3.3 Examples

In this section we present several examples. The first is a simulated example for which we illustrate the package jrfit. Following that, we present several real examples.

Simulated Dataset

To fix ideas, we present an analysis of a simulated dataset utilizing both the compound symmetry and sandwich estimators discussed in the previous section.

The setup is as follows:

> m<-160 # blocks

> n<-4  # observations per block

> p<-1  # baseline covariate

> k<-2  # trtmnt groups

First, we set up the design and simulate a baseline covariate which is normally distributed.

> trt<-as.factor(rep(sample(1 :k,m,replace=TRUE),each=n))

> block<-rep(1:m,each=n)

> x<-rep (rnorm (m),each=n)

Next, we set the overall treatment effect to be Δ = 0.5, so that we can form the response as follows. We simulate the block effects from a t-distribution with 3 degrees of freedom and the random errors from a t-distribution with 5 degrees of freedom. Note that the assumption for exchangeable errors is met.

> delta<-0.5

> w<-trt==2

> Z<-model.matrix(˜ as.factor(block))

> e<-rt(m*n,df=5)

> b<-rt(m,df=3)

> y<-delta*w+Z%*%b+e

Note the regression coefficient for the covariate was set to 0.

First we analyze the data with the compound symmetry assumption. The three required arguments to jrfit are the design matrix, the response vector, and the vector denoting block membership. In future releases we plan to incorporate a model statement as we have done in Rfit similar to the one in friedman.test.

> library(jrfit)

> X<-cbind(w,x)

> fit<-jrfit(X,y,block, var.type=‘cs’)

> summary(fit)

Coefficients:

 Estimate Std. Error t-value p.value

 1.395707 0.165898 8.4130 2.636e-16 ***

w 0.256514 0.252201 1.0171 0.3095

x 0.083595 0.130023 0.6429 0.5205

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

Notice, by default the intercept is displayed in the output. If the inference on the intercept is of interest then set the option int to TRUE in the jrfit summary function. The cell medians model can also be fit as follows.

> library(jrfit)

> W<-model.matrix(˜trt-1)

> X<-cbind(W,x)

> fit<-jrfit(X,y,block, var.type=‘cs’)

> summary(fit)

Coefficients:

 Estimate Std. Error t-value  p.value

trt1 1.395707  0.165898 8.4130 2.636e-16 ***

trt2 1.652221  0.168449 9.8085 < 2.2e-16 ***

x 0.083595  0.130023 0.6429 0.5205

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

Next we present the same analysis utilizing the sandwich estimator.

> X<-cbind(w,x)

> fit<-jrfit(X,y,block, var.type=‘sandwich’)

> summary(fit)

Coefficients:

 Estimate Std. Error t-value  p.value

 1.395707  0.164288 8.4955 1.294e-14 ***

w 0.256514  0.247040 1.0383 0.3007

x 0.083595  0.113422 0.7370 0.4622

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

> X<-cbind(W,x)

> fit<-jrfit(X,y,block, var.type=‘sandwich’)

> summary(fit)

Coefficients:

 Estimate Std. Error t-value  p.value

trt1 1.395707  0.164288 8.4955 1.294e-14 ***

trt2 1.652221  0.166176 9.9426 < 2.2e-16 ***

X 0.083595  0.113422 0.7370 0.4622

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

For this example, the results of the analysis based on the compound symmetry method and the analysis based on the sandwich method are quite similar.

Crabgrass Data

Cobb (1998) presented an example of a complete block design concerning the weight of crabgrass. The fixed factors in the experiment were the density of the crabgrass (four levels) and the levels (two) of the three nutrients nitrogen, phosphorus, and potassium. So p = 6. Two complete blocks of the experiment were carried out, so altogether there are N = 64 observations. In this experiment, block is a random factor. Under each set of experimental conditions, crabgrass was grown in a cup. The response is the dry weight of a unit (cup) of crabgrass, in milligrams. The R analysis of these data were first discussed in Kloke et al. (2009).

The model is a mixed model with one random effect

$Y_{k i} = α + x_{k i}^{T} β + b_{k} + ε_{k i} for k = 1, 2 and j = 1, ..., 32.$

The example below illustrates the rank-based analysis of these data using jrfit.

> library(jrfit)

> data(crabgrass)

> x<-crabgrass[,1:6]; y<-crabgrass[,7]; block<-crabgrass[,8]

> fit<-jrfit(x,y,block,v1=tcs)

> rm(x,y,block)

> summary(fit)

Coefficients:

  Estimate Std. Error t-value p.value

  28.31823 2.77923 10.1892 0.009495 **

N 39.87865 3.45475 11.5431 0.007422 **

P 10.96732 4.25586 2.5770 0.123335

K  1.59380 3.91480 0.4071 0.723357

D1 24.08362 1.17606 20.4782 0.002376 **

D2 7.95646 0.50716 15.6882 0.004038 **

D3 3.26657 7.46598 0.4375 0.704443

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

Based on the summary of the fit, the factors nitrogen and density are significant. The Studentized residual plot based on the Wilcoxon fit is given in Figure 8.1. Note the one large outlier in this plot. As discussed in Cobb (1998) and Kloke et al. (2009) this outlier occurred in the data. It impairs the traditional analysis of the data but has little effect on the robust analysis.

Electric Resistance Data

Presented in Stokes et al. (1995), these data are from an experiment to determine if five electrode types performed similarly. Each electrode type (etype) was applied to the arm of 16 subjects. Hence there are 16 blocks and one fixed factor at 5 levels.

The classical nonparametric approach to addressing the question of a difference between the electrode types is to use Friedman’s test (Friedman 1937), which is the analysis that Stokes et al. (1995) used. As discussed in Section 8.2, Friedman’s test is available in base R via the function friedman.test. We illustrate its use with the electrode dataset available in jrfit.

>  library(jrfit)

>  friedman.test (resistance˜ etype/subject, data=eResistance)

  Friedman rank sum test

data:  resistance and etype and subject

Friedman chi-squared = 5.4522, df = 4, p-value = 0.244

From the comparison boxplots presented in Figure 8.2 we see there are several outliers in the data.

Figure 8.2

Figure showing comparison boxplots of resistance for five different electrode types.

Comparison boxplots of resistance for five different electrode types.

First we consider a cell medians model where we estimate the median resistance for each type of electrode. There the model is yki = μi + bk + eki where μi represents the median resistance for the ith type of electrode, bk is the kth subject (random) effect, and eki is the error term encompassing other variability. The variable etype is a factor from which we create the design matrix.

> x<-model.matrix(˜ eResistance$etype-1)

> fit<-jrfit(x,eResistance$resistance,eResistance$subject)

> summary (fit)

Coefficients:

      Estimate Std. Error  t-value p.value

eResistance$etype1  123.998 55.733 2.2248  0.040827 *

eResistance$etype2  211.002 53.894 3.9151  0.001234 **

eResistance$etype3  158.870 56.964 2.7890  0.013137 *

eResistance$etype4  106.526 53.817 1.9794  0.065241 .

eResistance$etype5  109.004 51.219 2.1282  0.049213 *

–––

Signif. codes: 0   ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

> muhat<-coef(fit)

> muhat

eResistance$etype1 eResistance$etype2 eResistance$etype3

  123.9983   211.0017   158.8703

eResistance$etype4 eResistance$etype5

  106.5256   109.0038

As the sample size is small, we will perform our inference assuming compound symmetry. Note, in practice, this can be a strong assumption and may lead to incorrect inference. However, given the nature of this experiment there is likely to be little carry-over effect; hence, we feel comfortable with this assumption.

Coefficients:

    Estimate Std. Error t-value  p.value

eResistance$etype1 123.998 52.674 2.3541 0.0212262 *

eResistance$etype2 211.002 52.674 4.0058 0.0001457 ***

eResistance$etype3 158.870 52.674 3.0161 0.0035080 **

eResistance$etype4 106.526 52.674 2.0223 0.0467555 *

eResistance$etype5 109.004 52.674 2.0694 0.0420014 *

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

Generally, however, we are interested in the effect sizes of the various electrodes. Here the model is yki = α + Δi + bk + eki where Δi denotes the effect size. We set Δ1 = 0 so that the first electrode type is the reference and the others represent median change from the first.

We may estimate the effect sizes directly or calculate them from the estimated cell medians as illustrated in the following code segment.

>  x<-x[,2:ncol(x)]

>  fit<-jrfit(x,eResistance$resistance, eResistance$subject, var.type=‘cs’)

>  summary (fit)

Coefficients:

    Estimate Std. Error t-value p.value

    123.994 52.473 2.3630 0.02076 *

eResistance$etype2  87.013 33.567 2.5922 0.01148 *

eResistance$etype3  34.218 33.567 1.0194 0.31134

eResistance$etype4 -17.977 33.567 -0.5356 0.59387

eResistance$etype5 -14.988 33.567 -0.4465 0.65654

–––

Signif. codes: 0 ‘***‘ 0.001 ‘**’ 0.01 ‘*‘ 0.05 ‘.’ 0.1 ‘ ’ 1

> muhat[2:5]-muhat[1]

eResistance$etype2 eResistance$etype3 eResistance$etype4 eResistance$etype5

  87.00340   34.87196  -17.47268  -14.99450

Next we illustrate a Wald test of the hypothesis that there is no effect due to the different electrodes; i.e.,

$H_{0} : Δ_{i} = 0 for all k = 2, ..., 5 versus H_{A} : Δ_{i} \neq 0 for some k = 2, ..., 5.$

> est<-fit$coef [2:5]

> vest<-fit$varhat[2:5,2:5]

> tstat<-t (est)%*%chol2inv(chol (vest))%*%est/4

> df2<-length(eResistance$resistance)-16-4-1

> pval<-pf(tstat,4,df2,lower.tail=FALSE)

> pval

   [, 1]

[1,] 0.01374127

Note that the overall test for effects is highly significant (p = 0.0137). This is a much stronger result than that of Friedman’s test which was nonsignificant with p-value 0.2440.

8.4 Robust Variance Component Estimators

Consider a cluster-correlated model with a compound symmetry (cs) variance-covariance structure; i.e., a simple mixed model. In many applications, we are interested in estimating the variance components and/or the random effects. For several of the fitting procedures discussed in this chapter, under cs structure, the iterative fitting of the fixed effects depends on estimates of the variance components. Even in the case of the JR fit of Section 8.3, variance components estimates are needed for standard errors and for the Studentization of the residuals. In this section, we discuss a general procedure for the estimation of the variance components and then focus on two procedures, one robust and the other highly efficient.

Consider then the cluster-correlated model (8.3) of the last section. Under cs structure we can write the model as

$\begin{array}{l} Y_{k i} = α + x_{k i}^{T} β + b_{k} + ε_{k i}, & k = 1, ..., m, i = 1, ..., n_{k}, \end{array} (8.7)$

where ϵki’s are iid with pdf f(t), bk’s are iid with pdf g(t), and the ϵki’s and the bk’s are jointly independent of each other. Hence, the random error for the fixed effects portion of Model (8.7) satisfies eki = bk + ϵki. Although we could write the following discussion in terms of a general scale parameter (functional) and avoid the assumption of finite variances, for easier interpretation we simply use the variances. The variance components are:

$\begin{array}{l} σ_{b}^{2} = Var (b_{k}) \\ σ_{ε}^{2} = Var (ε_{k i}) \\ σ_{t}^{2} = σ_{b}^{2} + σ_{ε}^{2} \\ ρ = \frac{σ_{b}^{2}}{σ_{b}^{2} + σ_{ε}^{2}} (8.8) \end{array}$

The parameter ρ is often called the intraclass correlation coefficient, while the parameter $σ_{t}^{2}$ is often denoted as the total variance.

We discuss a general procedure for the estimation of the variance components based on the residuals from a fit of the fixed effects. Let $\hat{θ}$ and $\hat{η}$ be respectively given location and scale estimators. Denote the residuals of the fixed effects fit by ${\hat{e}}_{k i} = Y_{k i} - \hat{α} - x^{T} \hat{β}$ . Then for each cluster k = 1,...,m, consider the pseudo model

$\begin{array}{l} {\hat{e}}_{k i} = b_{k} + ε_{k i}, & i = 1, ..., n_{k} . \end{array} (8.9)$

Predict bk by

${\hat{b}}_{k} = \hat{θ} ({\hat{e}}_{k 1}, ..., {\hat{e}}_{k n_{k}}) . (8.10)$

This estimate is the prediction of the random effect. Then estimate the variance of bk by the variation of the random effects, i.e.,

${\hat{σ}}_{b}^{2} = {\hat{η}}^{2} ({\hat{b}}_{1}, ..., {\hat{b}}_{m}) . (8.11)$

For estimation of the variance of ϵki, consider Model (8.9), but now move the prediction of the random effect to the left-side; that is, consider the model

$\begin{array}{l} {\hat{e}}_{k i} - {\hat{b}}_{k} = {\hat{ε}}_{k i}, & i = 1, ..., n_{k} . \end{array} (8.12)$

Then our estimate of the variance of ϵki is given by

${\hat{σ}}_{ε}^{2} = {\hat{η}}^{2} ({\hat{e}}_{11} - {\hat{b}}_{1}, ..., {\hat{e}}_{m n_{m}} - {\hat{b}}_{m}) . (8.13)$

Expressions (8.11) and (8.13) lead to our estimates of total variance given by ${\hat{σ}}_{t}^{2} = {\hat{σ}}_{b}^{2} + {\hat{σ}}_{ε}^{2}$ and, hence, our estimate of intraclass correlation coefficient as $\hat{ρ} = {\hat{σ}}_{b}^{2} / {\hat{σ}}_{t}^{2}$ .

Groggel (1983) and Groggel et al. (1988) proposed these estimates of the variance components except the mean is used as the location functional and the sample variance as the scale functional. Assuming that the estimators are consistent estimators for the variance components in the case of iid errors, Groggel showed that they are also consistent for the same parameters when they are based on residuals under certain conditions. Dubnicka (2004) and Kloke et al. (2009) obtained estimates of the variance components using the median and MAD as the respective estimators of location and scale. Since these estimators are consistent for iid errors they are consistent for our model based on the JR fit. The median and MAD comprise our first procedure for variance component estimation and we label it as the MM procedure. As discussed below, we have written R functions which compute these estimates.

Although robust, simulation studies have shown, not surprisingly, that the median and MAD have low efficiency; see Bilgic (2012). A pair of estimators which have shown high efficiency in these studies are the Hodges-Lehmann location estimator and the rank-based dispersion estimator based on Wilcoxon scores. Recall from Chapter 1 that the Hodges-Lehmann location estimator of a sample X1,...,Xn is the median of the pairwise averages

${\hat{θ}}_{H L} = \mod_{i \leq j} {\frac{X_{i} + X_{j}}{2}} . (8.14)$

This is the estimator associated with the signed-rank Wilcoxon scores. It is a consistent robust estimator of its functional for asymmetric as well as symmetric error distributions. The associated scale estimator is the dispersion statistic given by

$\hat{D} (X) = \frac{\sqrt{π}}{3 n} \sum_{i = 1}^{n} φ [\frac{R (X_{i})}{n + 1}] X_{i}, (8.15)$

where $φ (u) = \sqrt{12} [u - (1 / 2)]$ is the Wilcoxon score function. Note that D(X) is just a standardization of the norm, (4.8), of X. It is a consistent estimator of its functional for iid random errors as well as residuals; see Chapter 3 of Hettmansperger and McKean (2011). With the multiplicative factor $\sqrt{π / 3}$ in expression (8.14), $\hat{D} (X)$ is a consistent estimator of σ provided that Xi is normally distributed with standard deviation σ. Although more efficient than MAD at the normal model, the statistic $\hat{D} (X)$ has an unbounded influence function in the Y-space; see Chapter 3 of Hettmansperger and McKean (2011) for discussion. Hence, it is not robust. We label the procedure based on the Hodges-Lehmann estimator of location and the dispersion function estimator of scale the DHL method.

The R function vee computes these variance component estimators for median-MAD (mm) and the HL-disp (dhl) procedures. The input to each consists of the residuals and the vector which identifies the center or cluster. The function returns the vector of variance component estimates and the estimates (predictions) of the random effects. We illustrate their use in the following example.

Example 8.4.1 (Variance Component Estimates).

For the example we generated a dataset for a mixed model with a treatment effect (2 levels) and a covariate. The data are over 10 clusters each with a cluster size of 10; i.e., m = 10, ni = 10, and n = 100. The errors ϵki are iid N(0,1) while the random effects are N(0,3). Hence, the intraclass correlation coefficient is ρ = 3/(1 + 3) = 0.75. All fixed effects were set at 0. The following code segment computes the JR fit of the mixed model and the median-MAD and HL-dispersion variance component estimators. For both variance component estimators, we show the estimates ${\hat{σ}}_{ε}^{2}$ and ${\hat{σ}}_{b}^{2}$ .

> m<-10 # number of blocks

> n<-10 # number number

> k<-2 # number of treatments

> N<-m*n # total sample size

> x<-rnorm(N)      # covariate

> w<-sample(c(0,1),N, replace=TRUE) # treatment indicator

> block<-rep(1:m,n)    # m blocks of size n

> X<-cbind(x,w)

> Z<-model.matrix(˜as.factor(block)-1)

> b<-rnorm(m,sd=3)

> e<-rnorm(N)

> y<-Z%*%b+e

> fit<-jrfit(X,y,block)

> summary(fit)

Coefficients :

  Estimate Std. Error t-value p.value

 -1.959030  2.835237 -0.6910 0.5053

X -0.098897  0.370353 -0.2670 0.7949

W 0.172955  0.766447  0.2257 0.8260

> vee(fit$resid, fit$block, method=‘mm’)

$sigb2

[1] 22.85784

$sige2

[1] 0.6683255

> vee(fit$resid, fit$block)

$sigb2

  [1,]

[1,] 16.42025

$sige2

   [,1]

[1,] 0.9971398

Exercises 8.7.8-8.7.10 discuss the results of this example and two simulation investigations of the methods median-MAD and HL-dispersion for variance component estimation.

Of the two variance component methods of estimation, due to its robustness, we recommend the median-MAD procedure.

8.5 Multiple Rankings Estimator

A rank-based alternative to using the JR estimator of Section 8.3 is to use the MR estimator developed by Rashid et al. (2012). MR stands for multiple rankings as it utilizes a separate ranking for each cluster; while the JR estimator uses the rankings of the entire dataset or the joint rankings. The model is the same as 8.3 which we repeat here for reference

$Y_{k i} = α + x_{k i}^{T} β + e_{k i} for k = 1, ..., m, i = 1, ..., n_{k} . (8.16)$

The objective function is the sum of m separate dispersion functions each having a separate ranking given by

$D (β) = \sum_{k = 1}^{m} D_{k} (β) (8.17)$

where $D_{k} (β) = \sum_{i = 1}^{n_{k}} a (R_{k} (Y_{k i} - x_{k i}^{T} β)) (Y_{k i} - x_{k i}^{T} β)$ and $R_{k} (Y_{k i} - x_{k i}^{T} β)$ is the ranking of $Y_{k i} - x_{k i}^{T} β$ among $Y_{k 1} - x_{k 1}^{T} β, ..., Y_{k n_{k}} - x_{k n_{k}}^{T} β$ .

For asymptotic theory of the rank-based fit, we need only assume that the distribution of the random errors have finite Fisher information and that the density is absolutely continuous; see Hettmansperger and McKean (2011: Section 3.4) for discussion. In particular, as with the JR fit, the errors may have an asymmetric distribution or a symmetric distribution. Unlike the JR fit, though, for the MR fit each cluster can have its own score function.

The MR fit obtains the estimates of the fixed effects of the model while it is invariant to the random effects. The invariance of the MR estimate to the random effects is easy to see. Because the rankings are invariant to a constant shift, we have for center k that

$R_{j} (Y_{k i} - α - b_{k} - x_{k i}^{T} β) = R_{j} (Y_{k i} - α - x_{k i}^{T} β) .$

Because the scores sum to 0, for each center, it follows that the objective function DM R, and thus the MR estimator, are invariant to the random effects.

Rashid et al. (2012) show that the MR estimate is asymptotic normal with mean ß and variance

$τ^{2} V_{M R} = τ^{2} {(\sum_{k = 1}^{m} X_{k}^{T} X_{k})}^{- 1} (8.18)$

where the scale parameter τ is given by expression (3.19). If Wilcoxon scores are used, this is the usual parameter

$τ = {[\sqrt{12} \int f {(x)}^{2} d x]}^{- 1}, (8.19)$

where f(x) is the pdf of random errors eijl. In expression (8.19), we are assuming that the same score function is used for each cluster. If this is not the case, letting τk denote the scale parameter for cluster k, the asymptotic covariance matrix is ${(\sum_{k = 1}^{m} X_{k}^{T} X_{k} / τ_{k})}^{- 1}$ .

Model (8.16) assumes that there is no interaction between the center and the fixed effects. Rashid et al. (2012), though, developed a robust test for this interaction based on rank-based estimates which can be used in conjunction with the MR or JR analyses.

Estimation of Scale

A consistent estimator of the scale parameter τ can be obtained as follows. For the kth center, form the vector of residuals rMR with components

$τ_{M R, k i} = y_{k i} - x_{k i}^{T} {\hat{β}}_{M R} . (8.20)$

Denote by ${\hat{τ}}_{k}$ the estimator of τ proposed by Koul et al. (1987) for each of the m clusters. Note these estimates are invariant to the random effects. Furthermore, it is a consistent estimator of τ. As our estimator of τ, we take the average of these estimators, i.e.,

${\hat{τ}}_{M R} = \frac{1}{m} \sum_{k = 1}^{m} {\hat{τ}}_{j}, (8.21)$

which is consistent for τ. Here we assume that the same score function is used for each cluster. If this is not the case, then, as noted above, each $1 / {\hat{τ}}_{k}$ appears within the sum in expression (8.18).

Inference

Inference based on the MR estimate can be done in the same way as with other linear models discussed in this book. For example, Wald type tests and confidence intervals based on the MR estimates can be formulated in the same way as those based on the JR fit discussed in Section 8.3. Another test statistic, not readily available for the JR procedures, is based on the reduction of dispersion in passing from the reduced to the full model. Denote the reduction in dispersion by

$R D_{M R} = D_{M R} ({\hat{β}}_{M R, R}) - D_{M R} ({\hat{β}}_{M R, F}) . (8.22)$

Large values of RDMR are indicative of a lack of agreement between the collected data and the null hypothesis. As shown in Rashid et al. (2012), under H0

$D_{M R}^{*} = \frac{R D_{M R}}{{\hat{τ}}_{M R} / 2} converges in distribution to the χ^{2} (q) disribution . (8.23)$

A nominal α decision rule is to reject H0 in favor of HA, if $D_{M R}^{*} > χ_{α}^{2} (q)$ where q is the number of constraints.

The drop-in-dispersion test was discussed in Section 4.4.3 and is analogous to the likelihood test statistic −2logΛ in maximum likelihood procedure and has similar interpretation. The use of a measure of dispersion to assess the effectiveness of a model fit to a set of data is common in regression analysis.

The code to compute the MR estimate is in the R package1 mrfit. In the following example, the code segment demonstrates the analysis based on this R function.

Example 8.5.1 (Triglyceride Levels).

The dataset gly4gen is a simulated dataset similar to an actual trial. Lipid levels for the patients were measured at specified times. The response variable of interest is the change in triglyceride level between the baseline and the week 4 visit. Five treatment groups were considered. The study was conducted at two centers. Centers form the random block effect. Group 1 is referenced.

> data(gly4gen)

> X<-with(gly4gen,model.matrix(˜as.fact or(group)-1))

> X<-X[,2:5]

> y<-gly4gen$diffgly4

> block<-gly4gen$center

> fit<-mrfit(X,y,block,rfit(y˜X)$coef[2:5])

> summary(fit)

Coefficients :

     Estimate Std. Error t-ratio

Xas.factor(group)2 0.28523 0.29624 0.96283

Xas.factor(group)3 -2.41176 0.29624  -8.14118

Xas.factor(group)4 33.03831 0.29238 112.99851

Xas.factor(group)5 26.11310 0.29624  88.14797

Notice for this simulated data, the triglyceride levels of Groups 3 through 5 differ significantly from Group 1.

8.6 GEE-Type Estimator

As in the previous sections of this chapter, we consider cluster-correlated data. Using the same notation, let Yki denote the ith response in the kth cluster, i = 1,..., nk and k = 1,..., m, and let xki denote the corresponding p × 1 vector of covariates. For the kth cluster, stack the responses and covariates into the respective nk × 1 vector $Y_{k} = {(Y_{k 1}, ..., Y_{k n_{k}})}^{T}$ and nk × p matrix $X_{k} = {(x_{k 1}^{T}, ..., x_{k n_{k}}^{T})}^{T}$ .

In the earlier sections of this chapter, we considered mixed (linear and random) models for Yk. For formal inference, these procedures require that the marginal distributions of the random error vectors for the clusters have the same distribution. In this section, we consider generalized linear models (glm)for cluster-correlated data. For our rank-based procedures, this assumption on the marginal distribution of the random error vector is not required.

Assume that the distribution of Yki is in the exponential family; that is, the pdf of Yki is of the form

$\begin{matrix} f (y_{k i}) = \exp {[y_{k i} θ_{k i} - a (θ_{k i}) + b (y_{k i})] ϕ} . & (8.24) \end{matrix}$

It easily follows that $E [Y_{k i}] = a^{'} (θ_{k i})$ and $Var [Y_{k i}] = a^{″} (θ_{k i}) / ϕ$ . The covariates are included in the model using a specified function h in the following manner

$θ_{k i} = h (x_{k i}^{T} β) .$

The function h is called the link function. Often the canonical link is used where h is taken to be the identity function; i.e., the covariates are linked to the model via $θ_{k i} = x_{k i}^{T} β$ . The Hessian plays an important role in the fitting of the model. For the kth cluster it is the nk ×p matrix defined by

$\begin{matrix} D_{k} = \frac{\partial a^{'} (θ_{k})}{\partial β} = [\frac{\partial a^{'} (θ_{k i})}{\partial β_{j}}], & (8.25) \end{matrix}$

where $i = 1, ..., n_{k}$ , $j = 1, ..., p$ , and $θ_{k} = {(θ_{k 1}, ... θ_{k n_{k}})}^{T}$ .

If the responses within a cluster are independent then the above model is a generalized linear model. We are interested, though, in the cases where there is dependence within a cluster, which often occurs in practice. For the GEE estimates, we do not require the specific covariance of the responses, but, instead, we specify a dependence structure as follows. For cluster k, define the nk × nk matrix Vk by

$\begin{matrix} V_{k} = A_{k}^{1 / 2} R_{k} (α) A_{k}^{1 / 2} / ϕ, & (8.26) \end{matrix}$

where Ak is a diagonal matrix with positive elements on the diagonal, Rk (α) is a correlation matrix, and α is a vector of parameters. The matrix Vk is called the working covariance matrix of Yk, but it need not be the covariance matrix of Yk. For example, in practice, Ak and R are not infrequently taken to be the identity matrices. In this case, we say the covariance structure is working independence.

Liang and Zeger (1986) develop an elegant fit of this model based on a set of generalized estimating equations (GEE) which lead to an iterated reweighted least squares (IRLS) solution. As shown by Abebe et al. (2014), each step of their solution minimizes the Euclidean norm for a nonlinear problem. Abebe et al. (2014) developed an analogous rank-based solution that leads to an IRLS robust solution.

Next, we briefly describe Abebe et al.’s solution. Assume that we have selected a score function φ(u) which is odd about 1/2; i.e.,

$\begin{matrix} φ (1 - u) = - φ (u) . & (8.27) \end{matrix}$

The Wilcoxon score function satisfies this property as do all score functions which are appropriate for symmetric error distributions. As discussed in Remark 8.6.1 this can be easily modified for score functions which do not satisfy (8.27). Suppose further that we have specified the working covariance matrix V and that we also have a consistent estimate $\hat{V}$ of it. Suppose for cluster k that ${\hat{V}}_{k}$ is the current estimate of the matrix Vk. Let $Y_{k}^{*} = {\hat{V}}_{k}^{- 1 / 2} Y_{k}$ and let $g_{k i} (β) = c_{i}^{T} a^{'} (θ_{k})$ , where $c_{i}^{T}$ is the ith row of ${\hat{V}}_{k}^{- 1 / 2}$ . Then the rank-based estimate for the next step minimizes the norm

$\begin{matrix} D (β) = \sum_{k = 1}^{m} \sum_{i = 1}^{n_{k}} φ [R (Y_{k i}^{*} - g_{k i} (β)) / (n + 1)] [Y_{k i}^{*} - g_{k i} (β)] . & (8.28) \end{matrix}$

We next write the rank-based estimator as a weighted LS estimator. Let $e_{k i} (β) = Y_{k i}^{*} - g_{k i} (β)$ denote the (k,i)th residual and let $m_{r} (β) = {med}_{(k, i)} {e_{k i} (β)}$ denote the median of all the residuals. Then, because the scores, sum to 0 we have the identity,

$\begin{array}{l} D_{R} (β) = \sum_{k = 1}^{m} \sum_{i = 1}^{n_{k}} φ [R (e_{k i} (β)) / (n + 1)] [e_{k i} (β) - m_{r} (β)] \\ = \sum_{r k i = 1}^{m} \sum_{i = 1}^{n_{k}} \frac{φ [R (e_{k i} (β)) / (n + 1)]}{e_{k i} (β) - m_{r} (β)} {[e_{k i} (β) - m_{r} (β)]}^{2} \\ = \sum_{k = 1}^{m} \sum_{i = 1}^{n_{k}} w_{k i} (β) {[e_{k i} (β) - m_{r} (β)]}^{2}, (8.29) \end{array}$

where $w_{k i} (β) = φ [R (e_{k i} (β)) / (n + 1)] / [e_{k i} (β) - m_{r} (β)]$ is a weight function. We set wki(β) to be the maximum of the weights if eki(β)− mr(β) = 0. Note that by using the median of the residuals in conjunction with property (8.27) ensures that the weights are positive.

Remark 8.6.1.

To accommodate other score functions besides those that satisfy (8.27) quantiles other than the median can easily be used. For example, all rank-based scores are nondecreasing and sum to 0. Hence there are both negative and positive scores. So for a given situation with sample size n, replace the median mr with the i’th quantile where a(i’) ≤ 0 and a(j) > 0 for j > i’. Then the ensuing weights will be nonnegative.

Expression (8.29) establishes a sequence of IRLS estimates ${{\hat{β}}^{(j)}}, j = 1, 2, ...,$ which satisfy the general estimating equations (GEE) given by

$\begin{matrix} \sum_{k = 1}^{m} D_{k}^{T} {\hat{V}}_{k}^{- 1 / 2} {\hat{W}}_{k} {\hat{V}}_{k}^{- 1 / 2} [Y_{k} - a_{k}^{'} (θ) - m_{r}^{*} ({\hat{β}}^{(j)})] = 0. & (8.30) \end{matrix}$

see Abebe et al. (2014) for details. We refer to these estimates as GEE rank-based estimates, (GEERB).

Also, a Gauss-Newton type algorithm can be developed based on the estimating equations (8.30). Since $\partial a_{k}^{'} (θ) / \partial β = D_{k}$ , a first-order expansion of $a_{k}^{'} (θ)$ about the jth step estimate ${\hat{β}}^{(j)}$ is

$a_{k}^{'} (θ) = a_{k}^{'} ({\hat{θ}}^{(j)}) + D_{k} (β - {\hat{β}}^{(j)}) .$

Substituting the right side of this expression for $a_{k}^{'} (θ)$ in expression (8.30) and solving for ${\hat{β}}^{(j + 1)}$ yields

$\begin{array}{l} {\hat{β}}^{(j + 1)} = {\hat{β}}^{(j)} + {[\sum_{k = 1}^{m} D_{k}^{T} {\hat{V}}_{k}^{- 1 / 2} {\hat{W}}_{k} {\hat{V}}_{k}^{- 1 / 2} D_{k}]}^{- 1} \\ \times \sum_{k = 1}^{m} D_{k}^{T} {\hat{V}}_{k}^{- 1 / 2} {\hat{W}}_{k} {\hat{V}}_{k}^{- 1 / 2} [Y_{k} - a_{k}^{'} ({\hat{θ}}^{(j)}) - m_{r}^{*} ({\hat{β}}^{(j)})] . \end{array}$

Abebe et al. (2014) developed the asymptotic theory for these rank-based GEERB estimates under the assumption of continuous responses. They showed that under regularity conditions, the estimates are asymptotically normal with mean β and with the variance-covariance matrix given by

$\begin{array}{l} {\sum_{k = 1}^{m} D_{k}^{T} V_{k}^{- 1 / 2} W_{k} V_{k}^{- 1 / 2} D_{k}}^{- 1} {\sum_{k = 1}^{m} D_{k}^{T} V_{k}^{- 1 / 2} V a r (φ_{k}^{†}) V_{k}^{- 1 / 2} D_{i}} \\ \times {\sum_{k = 1}^{m} D_{k}^{T} V_{k}^{- 1 / 2} W_{k} V_{k}^{- 1 / 2} D_{k}}^{- 1}, (8.31) \end{array}$

where $φ_{k}^{†}$ denotes the nk × 1 vector $n_{k} \times 1 vector {(φ [R (e_{k 1}^{†}) / (n + 1)], ..., φ [R (e_{k n_{k}}^{†}) / (n + 1)])}^{T}$ and $e_{k n_{k}}^{†}$ is defined by the following expressions:

$\begin{array}{l} Y_{k}^{†} = V_{k}^{- 1 / 2} Y_{k} = {(Y_{k 1, ...,}^{†} Y_{k n_{k}}^{†})}^{T} \\ G_{k}^{†} (β) = V_{k}^{- 1 / 2} a_{i}^{'} (θ) = [g_{k i}^{†}] \\ e_{k i}^{†} = Y_{k i}^{†} - g_{k i}^{†} (β) . (8.32) \end{array}$

A practical implementation of inference the GEERB estimates is discussed in Section 8.6.4. As discussed in Abebe et al. (2014), the GEERB estimates are robust in the Y space (provided the score function is bounded) but not robust in the X space. We are currently developing a HBR GEE estimator which has robustness in both spaces and, also, has high breakdown.

The GEE estimates are quite flexible. The exponential family is a large family of distributions that is often used in practice. The choices of the link functions and working variance-covariance structures allow a large variety of models from which to choose. As shown in expression (8.31), the asymptotic covariance matrix of the estimate takes into account each of these choices; that is, the link function determines the Hessian matrices Dk, (8.25); the working covariance structure determines the matrices Vk, (8.26); and the pdf is reflected in the factor in the middle set of braces. We provided R code to compute GEERB estimates based on the Gauss–Newton step described above. The main driver is the R function geerfit which is included in the package2 rbgee. It can be easily modified to compute different options. In the next several subsections, we discuss the weights, link functions, and the working covariance structure, and our R functions associated with these items. We discuss some of the details of R code in the next three subsections which are followed by an illustrative example.

We have only recently developed geerfit so we caution the reader that it is an experimental version and updates are likely.

8.6.1 Weights

The R function wtmat (Dmat,eitb,med=TRUE,scores=wscores) computes the weights, where eitb is the vector of current residuals and Dmat is the current Hessian matrix. If the option med is set to TRUE then it is assumed that the score function is odd about $\frac{1}{2}$ and the median of the current residuals is used in the calculation of the weights as given in expression (8.29). If med is FALSE then the percentile discussed in Remark 8.6.1 is used. These are calculated at the current residuals, making use of the Hessian matrix D, (8.25).

8.6.2 Link Function

The link function connects the covariance space to the distribution of the responses. Note that it affects the fitting algorithm in its interaction with the vectors ak(θk), (8.24), and the Hessian matrices Dk, (8.25). We have set the default link to a linear model. Thus, in the routine getAp, for cluster k and with β(j) as the current estimate of β, the vector $a_{k}^{'} (θ_{k})$ is set to Xkβ(j) and, in the routine getD, the matrix Dk is set to Xk. For other link functions these routines have to be changed.

8.6.3 Working Covariance Matrix

The working covariance matrix V, (8.26), is computed in the function veemat. Currently there are three options available: working independence, “WI”; compound symmetry (exchangeable), “CS”; and autoregressive order 1, “AR”. Default is set at compound symmetry. For the compound symmetry case, this function also sets the method for computing the variance components. Currently, there are two options: the MAD-median option, “MM”, and the dispersion and Hodges-Lehmann estimator, “DHL.” The default option is the MAD-median option. Recall that the MAD-median option results in robust estimates of the variance components.

8.6.4 Standard Errors

Abebe et al. (2014) performed several Monte Carlo studies of procedures to standardize the GEEBR in terms of validity. One procedure involved estimation of the asymptotic variance-covariance matrix, (8.31), of GEEBR estimators, using the final estimates of the matrices V, W, and D. For cluster k, a simple moment estimator of $V a r (φ_{k}^{†})$ based on residuals is discussed in Abebe et al. (2014). The resulting estimator of the asymptotic variance-covariance matrix in the Monte Carlo studies, though, appeared to lead to a liberal inference. In the studies, a first-order approximation to the asymptotic variance-covariance matrix appeared to lead to a valid inference. The first-order approximation involves replacing the weight matrix $\hat{W}$ by ${\hat{τ}}^{- 1} I$ , where $\hat{τ}$ is the estimator of τ, and the matrix $V a r (φ_{k}^{†})$ by Ik. In our R function the indicator of the variance covariance procedure is the variable varcovst. The default setting is varcovst==“var2” which results is this approximation while the setting “var1 results in the estimation of the asymptotic form. The third setting, “var3” is a hybrid of the two where just W is approximated. This is similar to a sandwich type estimator.

8.6.5 Examples

The driver of our GEERB fit R function at defaults settings is:

$\begin{array}{l} g e e r f i t (x m a t, y, c e n t e r, c c o r e s = w s c o r e s, g e e m o d = " L M ", \\ s t r u c t r u e = " C S ", s u b s t r u c t u r e = " M M ", \\ m e d = T R U E, v a r c o v s t = " v a r 2 ", \\ m a x s t p = 50, e p s = 0.00001, d e l t a = 0.8, h p a r m = 2) . \end{array}$

The function geerfit assumes that y and xmat are sorted by center. The HBR version is under development and, hence, currently not available.

The routine returns the estimates of the regression coefficients and their standard errors and t-ratios, along with the variance-covariance estimator and the history (in terms of estimates) of the Newton steps. We illustrate the routine with two examples.

Example 8.6.1.

For this example, we simulated data with 5 clusters each of size 10. The covariance structure is compound symmetrical with the variances set at $σ_{ε}^{2} = 1$ and $σ_{b}^{2} = 3$ , so that the intraclass coefficient is ρ = 0.75. The random errors and random effects are normally distributed, N(0,1), with the true β set at (0.5,0.35,0.0)T. There are 3 covariates which were generated from a standard normal distribution. The data can be found in the dataset eg1gee. The following R segment loads the data and computes the Wilcoxon GEE fit. We used the default settings for the GEERB fit. In particular, compound symmetry was the assumed covariance structure and the variance component estimates are returned in the code.

> xmat<- with(eg1gee,cbind(x1,x2,x3))

> gwfit <- geerfit(xmat,y,block)

> gwfit$tab

   Est   SE t-ratio

x1 0.54067723 0.1118348 4.83460610

x2 -0.01492029 0.1576557 -0.09463847

x3 -0.20872498 0.1300584 -1.60485548

> vc <- gwfit$vc

> vc

[1] 2.2674708 1.0457868 1.2216840 0.4612129

> rho <- vc[2] /vc[1]

> rho

[1] 0.4612129

The GEERB estimates of the three components of ß are 0.541, −0.015, and − 0.209. Based on the standard error of the estimates, the true value of each component of $\hat{β}$ is trapped within the respective 95% confidence interval. The estimates of the variance components are ${\hat{σ}}_{t}^{2} = 2.267$ , ${\hat{σ}}_{b}^{2} = 1.046$ , ${\hat{σ}}_{ε}^{2} = 1.222$ , and $\hat{ρ} = 0.461$ .

We next change the data so that y11 = 53 instead of 1.53 and rerun the fits:

> y[1] <- 53

> gwfit <- geerfit(xmat,y,block)

> gwfit$tab

   Est   SE t-ratio

x1 0.6069074818 0.1491861 4.068123565

x2 -0.0006604922 0.2119288 -0.003116575

x3 -0.3447768897 0.1454904 -2.369756997

> gwfit$vc

[1] 2.1359250 0.6595324 1.4763927 0.3087807

There is little change in the estimate of β, verifying the robustness of the GEERB estimator. ■

Example 8.6.2 (Rounding Firtsbase, Continued).

Recall that in Example 8.2.1 three methods (round out, narrow angle, and wide angle) for rounding first base were investigated. Twenty-two baseball players served as blocks. The responses are their average times for two replications of each method. Thus the design is a randomized block design. In Example 8.2.1, the data are analyzed using Friedman’s test which is significant for treatment effect. Friedman’s analysis, though, consists of only a test. In contrast, we next discuss the rank-based analysis based on the GEERB fit. In addition to a test for over all treatment effect, it offers estimates (with standard errors) of size effects, estimates of the variance components, and a residual analysis for checking quality of fit and in determining outliers. We use a design matrix which references the first method. The following code provides the Wald test, based on the fit, which tests for differences among the three methods:

>  fit <- geerfit(xm,y,center)

>  beta <- fit$tab[,1]

>  tst <- t(beta)%*%solve(fit$varcov)%*%beta/2

> pv <- 1-pchisq(tst,2)

>  c(tst ,pv)

[1] 6.36264299 0.04153074

The Wald test is significant at the 5% level. With Friedman’s method, this would be the end of the analysis. Let μi denote the mean time of method i. The effects of interest are the differences between these means. Because method 1 is referenced, the summary of the fit (fit$tab) provides the inference for μ3 − μ1 and μ2–μ1. However, the next few lines of code yield the complete inference for comparison of the methods. The summary is displayed in Table 8.1.

Table 8.1

Summary Table of Effects for the Firstbase Data.

	Effect Est.	SE	t-ratio
mu2 minus mu1	-0.02	0.03	-0.69
mu3 minus mu1	-0.10	0.03	-2.92
mu3 minus mu2	-0.08	0.02	-3.44

> h <- matrix(c(-1,1),ncol=2); e32 <- h%*%beta

> se32 <- sqrt(h%*%fit$varcov%*%t(h))

> t32 <- e32/se32; c(e32,se32,t32)

[1] -0.07826494 0.02271888 -3.44492906

Based on the Table 8.1, method 3 significantly differs from the other two methods while methods 1 and 2 do not differ significantly. Hence, overall, method 3 (rounding first base using a wide angle) results in the quickest times. The top panel of Figure 8.3 displays comparison boxplots of the three methods. Outliers are prevalent. The bottom panel shows the q–q plot of the residuals based on the GEERB fit. Notice that three outliers clearly stand out. A simple inspection of the residuals shows that these outliers correspond to the times of baseball player #22. He appears to be the slowest runner. The rank-based estimates of the variance components are:

Figure 8.3

Figure showing comparison boxplots of the three methods and the q–q plot of the residuals based on the geerb fit.

Comparison boxplots of the three methods and the q–q plot of the residuals based on the GEERB fit.

> fit$vc

[1] 0.013402740 0.012364328 0.001038412 0.922522403

The estimate of the intraclass correlation coefficient is 0.92, indicating a strong correlation of running times within players over the three methods.

8.7 Exercises

8.7.1. Transform the firstbase data frame into a vector and create categorical variables for treatment (rounding method) and subject. Obtain the results from friedman.test using these data objects.
8.7.2. Referring to Exercise 8.7.1 obtain estimates of the cell medians for each of the rounding methods. Also obtain estimates of the standard errors of these estimates of location using both a compound symmetry estimate as well as the sandwich estimate. Discuss.
8.7.3. It is straightforward to generate data from the simple mixed model. Write a function which generates a sample of size m of blocks each of size n which variance components σb and σϵ. Assume normal errors.
8.7.4. Extend 8.7.3 to include errors of a t distribution.
8.7.5. Extend 8.7.3 to allow for different block sizes.
8.7.6. On page 418, Rasmussen (1992) discusses a randomized block design concerned with the readings of four thermometers for melting point of hydroquinone. Three technicians were used as a blocking factor (each technician obtained measurements for all four thermometers). The data are presented next.

MeltPt

Therm.

Tech.

MeltPt

Therm.

Tech.

174.0

1

1

171.5

3

1

173.0

1

2

171.0

3

2

173.5

1

3

173.0

3

3

173.0

2

1

173.5

4

1

172.0

2

2

171.0

4

2

173.0

2

3

172.5

4

3
1. (a) Let the R vectors y, ind, block contain respectively the melting points, thermometers, and blocks. Argue that the following R code obtains the JR fit of the full model:
```
xmat<-cellx(ind); x2 <-xmat[,2:4]; fit <- jrfit(x2,y,block)
```
2. (b) Obtain the summary of the above fit. Notice that the first thermometer was referenced. Discuss what the regression coefficients are estimating. Do there seem to be any differences in the thermometers?
3. (c) Obtain residual and normal q–q plots of the Studentized residuals. Are there any outliers? Discuss the quality of fit.
4. (d) Argue that the following code obtains the Wald’s test of no differences among the parameters. Is this hypothesis rejected at the 5% level?
```
beta<-fit$coef; b<-beta[2:4]; vc <- fit$varhat[2:4,2:4]
```
```
tst <- t (b)%*%solve(vc)%*%b/3; pv<-1-pf (tst ,3,4)
```
8.7.7. Rasmussen (1992) (page 442) discusses a study on emotions. Each of eight volunteers were requested to express the emotions fear, happiness, depression, and calmness. At the time of expression, their skin potentials in millivolts were measured; hence, there are 32 measurements in all. The data can be found in the dataset emotion.
1. (a) Discuss an appropriate model for the data and obtain a rank-based fit of it.
2. (b) Using Studentized residuals check quality of fit.
3. (c) Test to see if there is a significant difference in the skin potential of the four emotions. Conclude at the 5% level.
4. (d) Obtain a 95% for the shift between the emotions fear and calmness.
5. (e) Obtain a 95% for the shift between the emotions depression and calmness.
8.7.8. Consider the results of the jrfit in Example 8.4.1. Obtain 95% confidence intervals for the fixed effects coefficients. Did they trap the true values?
8.7.9. Run a simulation of size 10,000 on the model simulated in Example 8.4.1. In the simulations, collect the estimates of the variance components $σ_{b}^{2}$ and $σ_{ε}^{2}$ for both the median-MAD and the HL-dispersion methods. Obtain the empirical mean square errors. Which method, if any, performed better than the other?
8.7.10. Repeat Exercise 8.7.9, but for this simulation, use the t-distribution with 2 degrees of freedom for both the random errors ϵki and the random effects bk.

MeltPt	Therm.	Tech.	MeltPt	Therm.	Tech.
174.0	1	1	171.5	3	1
173.0	1	2	171.0	3	2
173.5	1	3	173.0	3	3
173.0	2	1	173.5	4	1
172.0	2	2	171.0	4	2
173.0	2	3	172.5	4	3

1 See https://github.com/kloke/book for more information.

2 See https://github.com/kloke/book for more information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8 Cluster Correlated Data

Create new playlist

Sign In

Sign Up