Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6
Analysis of Variance – Models with Random Effects

6.1 Introduction

Whereas in Chapter 5 the general structure of analysis of variance models are introduced and investigated for the case that all effects are fixed real numbers (ANOVA model I), we now consider the same models but assume that all factor levels have randomly been drawn from a universe of factor levels. We call this the ANOVA model II. Therefore, effects (except the overall mean μ) are random variables and not parameters, which have to be estimated. Instead of estimating the random effects, we estimate and test the variances of these effects – called variance components. The terms main effect and the interaction effect are defined as in Chapter 5 (but these effects are now random variables).

6.2 One‐Way Classification

We characterise methods of variance component estimation for the simplest case, the one‐way ANOVA, and demonstrate most of them by some data set. For this, we assume that a sample of a levels of a random factor A has been drawn from the universe of factor levels, which is assumed to be large. In order to be not too abstract let us assume that the levels are sires. From the ith sire a random sample of n_i daughters is drawn and their milk yield y_ij recorded. This case is called balanced if for each of the sires the same number n of daughters has been selected. If the n_i are not all equal to n we called it an unbalanced design.

We consider the model

6.1

The a_i are the main effects of the levels A_i. They are random variables. The e_ij are the errors and also random. The constant μ is the overall mean. Model (6.1) is completed by the assumptions

6.2

all random components on the right‐hand side of (6.1) are independent.

The variances and σ² are called variance components. The total number of observations is always denoted by N, in the balanced case we have N = an. From (6.2) it follows

6.3

Let us assume that all the random variables in (6.1) are normally distributed even if this is not needed for all the estimation methods. Then it follows that a_i and e_ij are independent of each other N(0; σ_a²)‐ and N(0; σ²)‐distributed respectively. The y_ij are not independent of each other N(μ; σ² + σ_a²)‐distributed. The dependence exists between variables within the same factor level (class) because

We call cov(y_ij, y_ik), i = 1, … , a; j = 1, … , n the covariance within classes.

A standardised measure of this dependence is the intra‐class correlation coefficient

6.4

The ANOVA table is that of Table 5.2 but the expected mean squares (MS) differ from those in Table 5.2 and are given in Table 6.1.

images — Table 6.1 Expected mean squares of the one‐way ANOVA model II.

Source of variation	df	MS	E(MS) Unbalanced case	E(MS) Balanced case
Factor A	a − 1		σ² +	σ² +
Residual	a(n − 1)		σ²	σ²

6.2.1 Estimation of the Variance Components

For the one‐way classification, we describe several methods of estimation and compare them with each other. The analysis of variance method is the simplest one and stems from the originator of the analysis of variance, R. A. Fisher. In Henderson's fundamental paper from 1953, it was mentioned as method I. An estimator is defined as a random mapping from the sample space into the parameter space. The parameter space of variances and variance components is the positive real line. When realisations of such a mapping can be outside of the parameter space, we should not call them estimators. Nevertheless, this term is in use in the estimation of variance components following the citation “A good man with his groping intuitions! Still knows the path that is true and fit.” Or its German original in Goethe's Faust, Prolog. (Der Herr): “Ein guter Mensch, in seinem dunklen Drange, ist sich des rechten Weges wohl bewußt.”

We use below the notation for the vector of all observations. The variance of Y is a matrix

6.5

6.2.1.1 ANOVA Method

The ANOVA method does not need the normal assumption; it follows in all classifications the same algorithm.

Algorithm for estimating variance components by the analysis of variance method:

Obtain from the column E(MS) in any of the ANOVA tables (here and in the further sections) the formula for each variance component
Replace E(MS) by MS and σ² in the equations by the corresponding s² to obtain the estimators or by s² to receive the estimates of the corresponding variance components.

Because in the equations differences of MS occur it may happen that, we receive negative estimators and estimates of positive parameters of the variance components. However, the ANOVA method for estimating the variance components have for normally distributed variables Y a positive probability that negative estimators, see Verdooren (1982). Nevertheless, the estimators are unbiased and for normally distributed variables Y in the balanced case they have minimum variance (best quadratic unbiased estimators), but negative estimates are impermissible, see Verdooren (1980).

We now demonstrate that algorithm in our one‐way case.

σ² = E(MS_res), ,
, s² = MS_res; , s² = MS_res.

The estimate is negative if MS_res > MS_A.

Problem 6.1

Estimate the variance components using the ANOVA method.

Solution

In the base package of R the ANOVA table is given with the command aov()but the expected mean squares, E(MS), are not given. Using the E(MS) to estimate the variance components directly we must use the R package VCA (variance component analysis).

Example

We use the milk fat performances (in kilograms) of the daughters of 10 sires randomly selected from a cattle population (including the bold entries) shown in Table 6.2 (with bold entries only used when equal sub‐class numbers are necessary). First, we give the balanced example with 12 observations per sire (or bull).

Table 6.2 Milk fat performances y_ij of daughters of ten sires.

	Sire (bull)
	B₁	B₂	B₃	B₄	B₅	B₆	B₇	B₈	B₉	B₁₀
	120 155 131 130 140 140 142 146 130 152 115 146	152 144 147 103 131 102 102 150 159 132 102 160	130 138 123 135 138 152 159 128 137 144 154 131	149 107 143 133 139 102 103 110 103 138 124 117	110 142 124 109 154 135 118 116 150 148 138 115	157 107 146 133 104 119 107 138 147 152 124 142	119 158 140 108 138 154 156 145 150 124 100 140	150 135 150 125 104 150 140 103 132 128 122 154	144 112 123 121 132 144 132 129 103 140 106 152	159 105 103 105 144 129 119 100 115 146 108 119
n_i	12	12	11 12	10 12	12	12	11 12	12	12	12

 > #  usual ANOVA Table:
> b1 <- c(120,155,131,130,140,140,142,146,130,152,115,146)
> b2 <- c(152,144,147,103,131,102,102,150,159,132,102,160)
> b3 <- c(130,138,123,135,138,152,159,128,137,144,154,131)
> b4 <- c(149,107,143,133,139,102,103,110,103,138,124,117)
> b5 <- c(110,142,124,109,154,135,118,116,150,148,138,115)
> b6 <- c(157,107,146,133,104,119,107,138,147,152,124,142)
> b7 <- c(119,158,140,108,138,154,156,145,150,124,100,140)
> b8 <- c(150,135,150,125,104,150,140,103,132,128,122,154)
> b9 <- c(144,112,123,121,132,144,132,129,103,140,106,152)
> b10 <- c(159,105,103,105,144,129,119,100,115,146,108,119)
> bull <- rep(1:10, each = 12)
> y <- c(b1,b2,b3,b4,b5,b6,b7,b8,b9,b10)
> Table_6_2b <- data.frame(bull, y) 
> # the b is used for balanced
> BULL <- factor(bull)
> Anova1b <- aov(y ∼ BULL)
> Anova1b
Call:
   aov(formula = y ∼ BULL)
Terms:
 7                   BULL Residuals
Sum of Squares   3814.63  33547.33
Deg. of Freedom        9       110
Residual standard error: 17.46356
Estimated effects may be unbalanced

# Now ANOVA Table using the E(MS) column to estimate the variance components.

# Use for this the package VCA.

 > library(VCA)
> Anova2b <- anovaVCA(y ∼ bull,Table_6_2b, VarVC.method = c("scm"))
> Anova2b
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------

  Name  DF        SS           MS         VC         %Total    SD
1 total 116.76977                         314.88179  100       17.744909
2 bull  9         3814.633333  423.848148 9.906033   3.145953  3.147385
3 error 110       33547.333333 304.975758 304.975758 96.854047 17.463555
  CV[%]
1 13.547456
2 2.40289
3 13.332654
Mean: 130.9833 (N = 120)
Experimental Design: balanced  |  Method: ANOVA

#“scm” = Searle, S.R, Casella, G., McCulloch, C.E. (1992), Variance Components, Wiley New York

Remark

In this balanced one‐way classification the ANOVA table of model II with E(MS) is given in Table 6.3.

Table 6.3 ANOVA table of model II with E(MS) of the example of Problem 6.1.

Source of variation	df	SS	MS	E(MS)
Bulls	9	3 814.63	423.85	σ² + 12σ²_Bulls
Residual	110	33 547.33	304.98	σ²

Hence the VC = variance components estimates are s² = 304.98 and (423.85 − 304.98)/12 = 9.906.

Now we use the unbalanced example, hence the bold data are deleted in Table 6.2.

 > b3[12] <- NA
> b3
 [1] 130 138 123 135 138 152 159 128 137 144 154  NA
> b4[11] <- NA
> b4[12] <- NA
> b4
 [1] 149 107 143 133 139 102 103 110 103 138  NA  NA
> b7[12] <- NA
> b7
 [1] 119 158 140 108 138 154 156 145 150 124 100  NA
>  y <- c(b1,b2,b3,b4,b5,b6,b7,b8,b9,b10)
>  bull <- rep(1:10, each = 12)
> Table_6_2u <- data.frame(bull, y) 
> # the u is used for unbalanced
> Anova1u <- aov(y ∼ BULL)
> Anova1u
Call:
   aov(formula = y ∼ BULL)
Terms:
                    BULL Residuals
Sum of Squares   3609.11  33426.03
Deg. of Freedom        9       106
Residual standard error: 17.75781
Estimated effects may be unbalanced
4 observations deleted due to missingness

Now the ANOVA table using the E(MS) column to estimate the variance components.

 > Anova2u <- anovaVCA(y ∼ bull,Table_6_2u,VarVC.method = c("scm"))
There are 4 missing values for the response variable (obs: 36, 47, 48, 84)!
> Anova2u
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------
  Name  DF         SS           MS         VC         %Total    SD
1 total 113.684098                         322.728113 100       17.964635
2 bull  9          3609.106113  401.01179  7.38819    2.289292  2.718123
3 error 106        33426.031818 315.339923 315.339923 97.710708 17.757813
  CV[%]
1 13.704443
2 2.073538
3 13.546668
Mean: 131.0862 (N = 116, 4 observations removed due to missing data)
Experimental Design: unbalanced  |  Method: ANOVA

Remark

In this unbalanced one‐way classification ANOVA table model II with E(MS) is given in Table 6.4.

Table 6.4 ANOVA table of model II of the unbalanced one‐way classification with EMS.

Source of variation	df	SS	MS	E(MS)
Bulls	9	3 609.11	401.012	σ² + 11.5958
Residual	106	33 426.03	315.340	σ²

The coefficient of in E(MS) is (1/(10 − 1)*[116 − (7*12² + 2*11² + 10²)/116] = 11.5958.

Hence VC = variance components estimates are s² = 315.340 and (401.012 − 315.340)/11.5958 = 7.38819.

6.2.1.2 Maximum Likelihood Method

Now we use the assumption that y_ij in (6.1) is normally distributed with variance from (6.3). Further we assume equal sub‐class numbers, i.e. N = an because otherwise the description becomes too complicated. Those interested in the general case may read Sarhai and Ojeda (2004, 2005). Harville (1977) gives a good background of the maximum likelihood approaches.

The density function of the vector of all observations is (with ⊕ for direct sum)

6.6

and this becomes

6.7

with SS_res and SS_A from Table 5.2.

We obtain the maximum‐likelihood – estimates , _, and by zeroing the derivatives of ln L with respect to the three unknown parameters and obtain

From the first equation (after transition to random variables) it follows for the estimators

and from the two other equations

and

6.8

hence

6.9

Because the matrix of the second derivatives is negative definite, we really reach maxima.

Note that for a random sample of size n from a normal variable with distribution N(μ, σ²) the maximum likelihood ML estimate of μ is the sample mean and the ML estimate for σ² is [(n − 1)/n]s², where s² is the sample variance.

Problem 6.2

Estimate the variance components using the ML method.

Solution

In the base package of R the ML estimates of the variance components are not possible. We use therefore the R package lme4.

Example

We use the data of Table 6.2 in Problem 6.1 (including the bold entries). We use from there the data frame Table_6_2b.

For this balanced one‐way classification, the ML estimates are as follows.

 > # We use the package lme4.
> library(lme4)
> MLbalanced <- lmer (y ∼1+(1|bull),data=Table_6_2b,REML = FALSE)
> summary(MLbalanced)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y ∼ 1 + (1 | bull)
   Data: Table_6_2b
     AIC      BIC   logLik deviance df.resid
  1035.2   1043.6   -514.6   1029.2      117
Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.8318 -0.7580  0.1010  0.8049  1.7189
Random effects:
 Groups   Name        Variance Std.Dev.
 bull     (Intercept)   6.374   2.525
 Residual             304.976  17.464
Number of obs: 120, groups:  bull, 10
Fixed effects:
            Estimate Std. Error t value
(Intercept)  130.983      1.783   73.47

We use now the unbalanced data of Table 6.2 by deleting the bold data. We use from there the data frame Table_6_2u.

 > library(lme4)
> MLunbalanced <- lmer (y ∼1+(1|bull),data=Table_6_2u,REML   =   FALSE)
> summary(MLunbalanced)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y ∼ 1 + (1 | bull)
   Data: Table_6_2u
     AIC      BIC   logLik deviance df.resid
  1004.0   1012.3   -499.0    998.0      113
Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.7746 -0.8416  0.1290  0.8101  1.6326
Random effects:
 Groups   Name        Variance Std.Dev.
 bull     (Intercept)   3.248   1.802
 Residual             316.029  17.777
Number of obs: 116, groups:  bull, 10
Fixed effects:
            Estimate Std. Error t value
(Intercept)  131.084      1.746   75.06

6.2.1.3 REML – Estimation

Anderson and Bancroft (1952, p. 320) introduced a restricted maximum likelihood (REML) method. There are extensions by Thompson (1962) and a generalisation by Patterson and Thompson (1971). This method uses a translation invariant restricted likelihood function depending on the variance components to be estimated only and not on the fixed effect μ. This restricted likelihood function is a function of the sufficient statistics for the variance components. The latter is then derived with respect to the variance components under the restriction that the solutions are non‐negative.

The method REML can be found in Searle et al. (1992). The method means that the likelihood function of TY is maximised in place of the likelihood function of Y. T is a (N − a − 1) × N matrix, whose rows are N − a − 1 linear independent rows of I_N − X(X^TX)⁻X^T with X from 6.1 in Rasch and Schott (2018).

The (natural) logarithm of the likelihood function of TY is with V in (6.5)

Now we differentiate this function with respect to and zeroing these derivatives. The arising equation we solve iteratively and gain the estimators.

6.10

6.11

Because the matrix of second derivatives is negative definite, we find the maxima.

Note that for a random sample of size n from a normal variable with distribution N(μ, σ²) the REML estimate of μ is the sample mean and the REML estimate for σ² is s², where s² is the sample variance.

This REML method is increasingly in use in applications – especially in animal breeding; even for not normally distributed variables. Another method MINQUE (minimum norm quadratic unbiased estimator) to estimate variance components, which is not based on normally variables, needs an idea of the starting value of the variance components. Using an iterative procedure by inserting the outcomes of the previous MINQUE procedure in the next MINQUE procedure gives the same result as the REML procedure! Hence, even for not normally distributed model effects we can use the REML.

Furthermore, for balanced designs with ANOVA estimators when the estimates are not negative, the REML estimates give the same answers. If the ANOVA estimate is negative because MS_A is smaller than MS_res, this is an inadmissible estimate. However, the REML estimate gives in such cases the correct answer zero. Hence, in practice the REML estimators are preferred for estimating the variance components. The REML method should always be used if the data are not balanced.

Besides methods based on the frequency approach generally used in this book there are Bayesian methods of variance component estimation. In this approach, we assume that the parameters of the distribution, and by this especially the variance components, are random variables with some prior knowledge about their distribution. This prior knowledge is sometimes given by an a priori distribution, sometimes by data from an earlier experiment. This prior distribution is combined with the likelihood of the sample resulting in a posterior distribution. The estimator is used in such a way that it minimises the so‐called Bayes risk. See Tiao and Tan (1965), Federer (1968), Klotz et al. (1969), Gelman et al. (1995).

Problem 6.3

Estimate the variance components using the REML method.

Solution

We use the R package lme4.

Example

For this balanced one‐way classification we use for REML estimates the data frame Table 6.2 (including the bold entries).

 > library(lme4)
> REMLbalanced <- lmer (y ∼1+(1|bull),data=Table_6_2b,REML = TRUE)
> summary(REMLbalanced)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | bull)
   Data: Table_6_2b
REML criterion at convergence: 1026.2
Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.8547 -0.7579  0.1034  0.8338  1.7646
Random effects:
 Groups   Name        Variance Std.Dev.
 bull     (Intercept)   9.906   3.147
 Residual             304.976  17.464
Number of obs: 120, groups:  bull, 10
Fixed effects:
            Estimate Std. Error t value
(Intercept)  130.983      1.879   69.69

The estimates of the variance component using the REML method are therefore

Note: in the balanced case, the REML estimates are equal to the ANOVA estimates because the ANOVA estimate for is positive.

We use the unbalanced data of Table 6.2 (excluding the bold entries) of Problem 6.1. For this unbalanced one‐way classification we use for REML estimates the data frame Table_6_2u.

We use the package lme4.

 > library(lme4)
> REMLunbalanced <- lmer (y ∼1+(1|bull),data=Table_6_2u,REML = TRUE)
> summary(REMLunbalanced)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | bull)
   Data: Table_6_2u

REML criterion at convergence: 995
Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.7979 -0.8001  0.1156  0.8381  1.6872
Random effects:
 Groups   Name        Variance Std.Dev.
 bull     (Intercept)   6.802   2.608
 Residual             315.901  17.774
Number of obs: 116, groups:  bull, 10
Fixed effects:
            Estimate Std. Error t value
(Intercept)  131.083      1.845   71.03

The estimates of the variance component using the REML method are therefore

Note: in the unbalanced case the REML estimates are not equal to the ANOVA estimates.

6.2.2 Tests of Hypotheses and Confidence Intervals

For the balanced one‐way random model to construct the confidence intervals for σ²_a and σ² and to tests hypotheses about these variance components we need besides (6.2) a further side condition in the model equation 6.1) about the distribution of y_ij. We assume now that the y_ij are not independent of each other ‐distributed. Then for the distribution of MS_A and MS_res we know from theorem 6.5 in (Rasch and Schott (2018)) that for the special case of equal sub‐class numbers (balanced case) the quadratic forms and are independent of each other CS[a(n − 1)]‐ and CS[a − 1]‐ distributed, respectively. From this, it follows that

6.12

is [(σ² + n σ²_A)/σ²] F[a − 1, a(n − 1)]‐distributed. Under the null hypothesis

F is distributed as F[a − 1, a(n − 1)]. If we have for the p‐value Pr(F[a − 1, a(n − 1)] > F) ≤ α, then is rejected at significance level α.

For an unbalanced design the distribution of SS_A is a linear combination of (a − 1) independent CS[1] variables, where the coefficients are functions of σ² and σ²_A. Hence F is not distributed as [((σ² + n σ²_A)/σ²)]F[a − 1, a(n − 1)], but under the null hypothesis , F is distributed as F[a − 1, a(n − 1)].

Problem 6.4

Test the null hypothesis for the balanced case with significance level α = 0.05.

Solution

From an ANOVA table made by R, for example with aov(), we can calculate (6.12) in R and find the p‐value by R using the command 1 ‐ pf( ).

Example

For the balanced data of Table 6.2 of Problem 6.1 we have already found the ANOVA table with aov().This is the ANOVA table for a fixed model, but the data provided are the same when we want to use the bulls as a random effect.

 > Anova1b <‐ aov(y ∼ BULL)
> Anova1b
Call:
   aov(formula = y ∼ BULL)
Terms:
                    BULL Residuals
Sum of Squares   3814.63  33547.33
Deg. of Freedom        9       110
Residual standard error: 17.46356
Estimated effects may be unbalanced
#  Insert the R commands:
> F <- (3814.63/33547.33)*(110/9)
> F
[1] 1.389775
> p_value <- 1 - pf(F,9,110)
> p_value
[1] 0.2013481

Conclusion

The F‐test gives the p‐value = 0.2013 > 0.05 hence is not rejected.

The random model can also be found with aov().

 > randombullb <- aov(y ∼ Error(BULL), data = Table_6_2b)
> randombullb
Call:
aov(formula = y ∼ Error(BULL), data = Table_6_2b)
Grand Mean: 130.9833
Stratum 1: BULL
Terms:
                Residuals
Sum of Squares   3814.633
Deg. of Freedom         9
Residual standard error: 20.58757
Stratum 2: Within
Terms:
                Residuals
Sum of Squares   33547.33
Deg. of Freedom       110
Residual standard error: 17.46356

For the calculation of the p‐value belonging to the F‐test, see above.

Problem 6.5

Test the null hypothesis for the unbalanced case with significance level α = 0.05.

Solution

From an ANOVA table made using R, for example with aov(), we can calculate (6.12) in R and find the p‐value using R with the command 1 ‐ pf( ).

Example

For the unbalanced data of Table 6.2 of Problem 6.1 we have already found the ANOVA table with aov( ).

 > Anova1u <‐ aov(y ∼ BULL)
> Anova1u
Call:
   aov(formula = y ∼ BULL)
Terms:
                    BULL Residuals
Sum of Squares   3609.11  33426.03
Deg. of Freedom        9       106
Residual standard error: 17.75781
Estimated effects may be unbalanced
4 observations deleted due to missingness
#  Insert the R commands:
> F <- (3609.11/33426.03)*(106/9)
> F
[1] 1.271682
> p_value <- 1 - pf(F,9,106)
> p_value <- 1 - pf(F,9,106)
> p_value
[1] 0.2608504

Conclusion

The F‐test gives the p‐value = 0.2609 > 0.05 hence is not rejected.

When we want to use the random model the ANOVA can also be found also with aov().

 > randombullu <- aov(y ∼ Error(BULL), data = Table_6_2u)
> randombullu
Call:
aov(formula = y ∼ Error(BULL), data = Table_6_2u)
Grand Mean: 131.0862
Stratum 1: BULL
Terms:
                Residuals
Sum of Squares   3609.106
Deg. of Freedom         9
Residual standard error: 20.02528
Stratum 2: Within
Terms:
                Residuals
Sum of Squares   33426.03
Deg. of Freedom       106
Residual standard error: 17.75781

For the calculation of the p‐value belonging to the F‐test, see above.

In the R package lme4 the test of the null hypothesis with α = 0.05 does not proceed with the F‐test but with the likelihood ratio (LR) test. Usually the LR test of models of fit are done with the maximum likelihood (ML) estimates. However, for random models like model II the LR test is appropriate even if the models are fit by the REML estimates. However, for the so‐called mixed models with fixed and random effects as described in Chapter 7, the REML fit that differs in its fixed effects is inappropriate; in that case we must use the fit using ML estimates. However, for the one‐way model II we must use the LR with the ML because the model under H₀ would only be fitted by the command lm(), which uses the ML.

We fit with the R package lme4 first the largest model for the balanced data of Problem 6.1 with the data‐frame Table_6_2b and with ML.

 > library(lme4)
> lmer1 <-  lmer(y ∼ 1 + (1| bull), data = Table_6_2b,REML = FALSE)

Then we fit the model belonging to the null hypothesis with lm( ) from the R base package.

 > lmer2 <-  lm(y ∼ 1, data = Table_6_2b)
> # The likelihood-ratio test is then done by
> anova(lmer1, lmer2, refit = FALSE)
Data: Table_6_2b
Models:
lmer2: y ∼ 1
lmer1: y ∼ 1 + (1 | bull)
       Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
lmer2a  2 1033.5 1039.0 -514.73   1029.5
lmer1a  3 1035.2 1043.6 -514.61   1029.2 0.2443      1     0.6211

Note: we call the LR test statistic Chisq and it is given as 0.2443;

Hence the null hypothesis is not rejected if we use the significance level α = 0.05.

The exact F‐test gives the p‐value = 0.2609 and the LR test the p‐value 0.6211.

The LR test does not work well for small data sets.

Based on the results above we can construct confidence intervals for the variance component σ². Because q₁ is CS[a(n − 1)]‐distributed,

6.13

is a (1 − α) confidence interval for σ² if n = n₁ = ⋯ = n_a and further that

6.14

is a (1 − α) confidence interval for the intra‐class correlation coefficient (ICC) if

An exact (1 − α) confidence interval for /σ² for balanced data is with F = MS_A/MS_res:

An approximate (1 − α) confidence interval for for balanced data is given by Williams (1962)

An approximate (1 − α) confidence interval for in the case of unequal sub‐class numbers is obtained by Seely and Lee (1994).

Problem 6.6

Construct a (1 − α) confidence interval for and a (1 − α) confidence interval for the ICC with α = 0.05.

Solution

From the ANOVA table of a balanced design with model II we obtain SS_A= SSA and SS_res= SSRes with the corresponding degrees of freedom. With R commands, we solve (6.13) and (6.14).

Example

For the balanced data of Table 6.2 of Problem 6.1 we have found the ANOVA table with aov( ).

 > Anova1b <‐ aov(y ∼ BULL)
> Anova1b
Call:
   aov(formula = y ∼ BULL)
Terms:
                    BULL Residuals
Sum of Squares   3814.63  33547.33
Deg. of Freedom        9       110
Residual standard error: 17.46356
Estimated effects may be unbalanced
> #  Insert the R commands:
> SSA <- 3814.63
> SSRes <- 33547.33
> dfA <- 9
> dfRes <- 110
> MSA <- SSA/dfA
> MSRes <- SSRes/dfRes
> chiu <- qchisq(0.975, dfRes)
> chil <- qchisq(0.025, dfRes)
> LowerCI1 <- SSRes/chiu
> UpperCI1 <- SSRes/chil
> LowerCI1   # lower CI Limit for variance
[1] 238.0652
> UpperCI1   # upper CI Limit for variance
[1] 404.8331
> MSRes      # estimate variance
[1] 304.9757
> Fu <- qf(0.975, dfA ,dfRes)
> Fl <- qf(0.025, dfA, dfRes)
> LowerCI2 <- (MSA - MSRes*Fu)/(MSA + (12-1)*MSRes*Fu)
> UpperCI2 <- (MSA - MSRes*Fl)/(MSA + (12-1)*MSRes*Fl)
> LowerCI2 #lower CI Limit intra-class correlation coefficient
[1] -0.03246049
> UpperCI2 #upper CI Limit intra-class correlation coefficient
[1] 0.2366987
> varA <- (MSA - MSRes)/12
> intraCC <- varA/(varA + MSRes)
> intraCC   # estimate intra-class correlation coefficient
[1] 0.03145944

Remark

The intra‐class correlation coefficient 0.0315 is positive, hence LowerCI2 must be 0.

6.2.3 Expectation and Variances of the ANOVA Estimators

Because the estimators obtained using the ANOVA method are unbiased, we get

and

The variances and covariance of the ANOVA estimators of the variance components in the balanced case for normally distributed y_ij are:

6.15

6.16

6.17

Estimators for the variances and covariance in (6.15)–(6.17) can be obtained by replacing the quantities σ² and occurring in these formulae by their estimators s² and respectively. These estimators of the variances and covariance components are biased. We get

6.18

6.19

and

6.20

Problem 6.7

Estimate the variance of the ANOVA estimators of the variance components σ² and .

Solution

From an ANOVA table for a balanced design we find s² and with the df_A and df_res.

Using R we insert these into (6.18)–(6.20).

Example

From the balanced data of Table 6.2 of Problem 6.1 we find for s² and s²_A using df_A and df_res.

 > Anova2b <- anovaVCA(y ∼ bull,Table_6_2b, VarVC.method = c("scm"))
> Anova2b
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------

  Name  DF        SS           MS         VC         %Total    SD
1 total 116.76977                         314.88179  100       17.744909
2 bull  9         3814.633333  423.848148 9.906033   3.145953  3.147385
3 error 110       33547.333333 304.975758 304.975758 96.854047 17.463555
  CV[%]
1 13.547456
2 2.40289
3 13.332654
Mean: 130.9833 (N = 120)
Experimental Design: balanced  |  Method: ANOVA
> s2rest <- 304.975758
> s2A <- 9.906033
> n <- 12
> a <- 10
> var_s2rest <- (2*s2rest*s2rest)/(a*(n-1) + 2)
> var_s2rest  # Estimate of the variance of s2rest
[1] 1660.897
> var_s2A <- (2/(n*n))*((s2rest + s2A)/(a+1) - (s2rest/(a*(n-1)+2)))
> var_s2A  # Estimate of variance of s2A
[1] 0.3597586

Hence we obtain the estimates s² = 304.98 and .

Deriving the formulae for var() and for unequal n_i is cumbersome. The derivation can be found in Hammersley (1949) and by another method in Hartley (1967). Townsend (1968, appendix IV) gives a derivation for the case of simple unbalanced designs.

6.3 Two‐Way Classification

Here and in Section 6.4, we consider mainly the estimators of the variance components with the analysis of variance method and the REML method.

6.3.1 Two‐Way Cross Classification

In the two‐way cross‐classification our model is

6.21

with side conditions that a_i, b_j, (a,b)_ij, and e_ijk are uncorrelated and:

For testing and confidence intervals we additionally assume that y_ijk is normally distributed.

In a balanced two‐way cross‐classification (n_ij = n for all i, j) and normally distributed y_ijk the sum of squares in Table 5.7 used as a theoretical table with random variables are stochastically independent, and we have

distributed.

To test the hypotheses:

we use the following facts.

The statistic

is the ‐fold of a random variable distributed as F[a − 1, (a − 1)(b − 1)]. If H_A0 is true F_A is F[a − 1, (a − 1)(b − 1)]‐distributed.

The statistic

is the ‐fold of a random variable distributed as F[b − 1, (a − 1)(b − 1)]. If H_B0 is true F_B is F[b − 1, (a − 1)(b − 1)]‐distributed.

The statistic

is the ‐fold of a random variable distributed as F[(a − 1)(b − 1), ab(n − 1)]. If H_AB0 is true, F_AB is F[(a − 1)(b − 1), ab(n − 1)]‐distributed.

The hypotheses H_A0, H_B0, and H_AB0 are tested by the statistics F_A, F_B, and F_AB respectively. If the observed F‐value is larger than the (1 − a)‐quantile of the central F‐distribution with the corresponding degrees of freedom we may conjecture that the corresponding variance component is positive and not zero.

Problem 6.8

Test for the balanced case the hypotheses:

with significance level α = 0.05 for each hypothesis.

Solution

In R we will use the package base to make first the ANOVA table for fixed effects. Then we do the test for the hypotheses according to the random model by making the correct F‐statistics. Then we use the package lme4 to test the hypotheses for model II of the two‐way classification.

In a factory four operators (A) are chosen at random, and four machines (B) are chosen at random. Each operator must make at a machine a certain product according to a certain specification. The deviations of the specification y of three products chosen at random are given in Table 6.5.

Table 6.5 Deviations of the specification y of three at random chosen products from four operators (A) and four machines (B).

y	Prod	A₁	A₂	A₃	A₄
B₁	1	8.4	7.9	6.6	4.6
	2	8.2	7.3	5.8	3.6
	3	7.1	6.8	3.4	4.5
B₂	1	−1.0	0.6	0.0	0.3
	2	1.2	0.3	0.7	1.0
	3	−0.9	−1.4	−1.0	0.9
B₃	1	0.5	1.0	−0.1	−0.6
	2	−1.1	2.6	−0.1	0.7
	3	0.4	0.8	0.1	0.0
B₄	1	−0.2	1.1	1.2	−1.0
	2	0.8	0.2	0.6	1.6
	3	0.7	2.3	0.3	1.0

 > y1 <- c(8.4,8.2,7.1,-1.0,1.2,-0.9,0.5,-1.1,0.4,-0.2,0.8,0.7)
> y2 <- c(7.9,7.3,6.8,0.6,0.3,-1.4,1.0,2.6,0.8,1.1,0.2,2.3)
> y3 <- c(6.6,5.8,3.4,0.0,0.7,-1.0,-0.1,-0.1,0.1,1.2,0.6,0.3)
> y4 <- c(4.6,3.6,4.5,0.3,1.0,0.9,-0.6,0.7,0.0,-1.0,1.6,1.0)
> y <-c(y1,y2,y3,y4)
> a <- rep(1:4, each = 12)
> b0 <- c(1,1,1,2,2,2,3,3,3,4,4,4)
> b <-c(b0,b0,b0,b0)
> problem7 <- data.frame(cbind(a,b,y))
> A <- factor(a)
> B <- factor(b)
    > fixedanova <- aov( y ∼ A + B + A*B, problem7)
    > summary(fixedanova)
                Df Sum Sq Mean Sq F value  Pr(>F)
    A            3   9.17    3.06   3.722 0.02108 *
    B            3 306.24  102.08 124.299 < 2e-16 ***
    A:B          9  25.46    2.83   3.445 0.00454 **
    Residuals   32  26.28    0.82
    ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The test for is correctly given with F‐value 3.445 = MS_AB/MS_res = 2.83/0.82.

Conclusion

The p‐value AB = Pr(F(9, 32) > 3.445) = 0.00454 < 0.05 hence is rejected.

The test for H₀: is not correctly given in the fixed ANOVA table. The F‐statistic is MS_A/MS_AB = 3.06/2.83 = 1.08 and the p‐value A = Pr(F(3,9) > 1.08) must be calculated in R.

 > FA <- 3.06/2.83
> FA
[1] 1.081272
> p_valueA <- 1-pf(FA,3,9)
> p_valueA
[1] 0.40517

Conclusion

Because we have a p‐value A = 0.40527 > 0.05, H₀: is not rejected.

The test for H₀: σ²_b = 0 is not correctly given in the fixed ANOVA table. The F‐statistic is

= 102.08/2.83 = 36.07 and the p‐value B = Pr(F(3.9) > 36.07) must be calculated in R.

 > FB <- 102.08/2.83
> FB
[1] 36.07067
> p_valueB <- 1-pf(FB, 3,9)
> p_valueB
[1] 2.412119e-05

Conclusion

Because we have a p‐value B = 0.000024 < 0.05, the H₀: is rejected.

With the R package lme4 we get using the REML:

 > library(lme4)
> model1 <- lmer( y ∼1 + (1|a) + (1|b)+(1|a:b),data= problem7)
> model2 <- lmer( y ∼1 + (1|a) + (1|b)  , data= problem7)
> model3 <- lmer( y ∼1  + (1|b) + (1|a:b), data= problem7)
> model4 <- lmer( y ∼1 + (1|a)  + (1|a:b), data= problem7)
> # We use the Likelihood-Ratio test with the REML estimates.
> # test H0: Variance component of A*B is zero
> anova(model1, model2, refit = FALSE)
    Data: problem7
    Models:
    model2: y ∼ 1 + (1 | a) + (1 | b)
    model1: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    model2  4 172.02 179.51 -82.011   164.02
    model1  5 167.54 176.90 -78.769   157.54 6.4826      1    0.01089 *
    ---------
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

Because the p‐value Pr(>Chisq) = 0.01089 < 0.05, the hypothesis is rejected.

 > # test H0: Variance component of A is zero
> Data: problem7
    Models:
    model3: y ∼ 1 + (1 | b) + (1 | a:b)
    model1: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    model3  4 165.55 173.03 -78.773   157.55
   model1  5 167.54 176.90 -78.769   157.54 0.0068      1     0.9341

Conclusion

Because the p‐value Pr(>Chisq) = 0.9341 is larger than 0.05, the hypothesis is not rejected.

 > # test H0: Variance component of B is zero
> anova(model1, model4, refit = FALSE)
    Data: problem7
    Models:
    model4: y ∼ 1 + (1 | a) + (1 | a:b)
    model1: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    model4  4 185.80 193.29 -88.901   177.80
    model1  5 167.54 176.90 -78.769   157.54 20.263      1   6.75e-06 ***
    ---------
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

Because the p‐value Pr(>Chisq) = 0.00000675 is smaller than 0.05, the hypothesis is rejected.

With the package lme4 we get using the ML:

 > library(lme4)
> model1 <- lmer( y ∼1 + (1|a) + (1|b)+(1|a:b),data= problem7)
> model2 <- lmer( y ∼1 + (1|a) + (1|b)  , data= problem7)
> model3 <- lmer( y ∼1  + (1|b) + (1|a:b), data= problem7)
> model4 <- lmer( y ∼1 + (1|a)  + (1|a:b), data= problem7)
> # We use the Likelihood-Ratio test with the ML estimates
> # test H0: Variance component of A*B is zero
> anova(model1, model2, refit = TRUE)
    refitting model(s) with ML (instead of REML)
    Data: problem7
    Models:
    model2: y ∼ 1 + (1 | a) + (1 | b)
    model1: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    model2  4 174.49 181.97 -83.243   166.49
    model1  5 169.98 179.34 -79.992   159.98 6.5028      1    0.01077 *
    ---------
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

Because the p‐value Pr(>Chisq) = 0.01077 is smaller than 0.05, the hypothesis is rejected.

 > # test H0: Variance component of A is zero
> anova(model1, model3, refit = TRUE)
    refitting model(s) with ML (instead of REML)
    Data: problem7
    Models:
    model3: y ∼ 1 + (1 | b) + (1 | a:b)
    model1: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
             Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    model3   4    167.99 175.47 -79.994   159.99
model1   5    169.98 179.34 -79.992   159.98 0.0042    1 0.9483

Conclusion

Because the p‐value Pr(>Chisq) = 0.9483 is larger than 0.05, the hypothesis is not rejected.

 > # test H0: Variance component of B is zero
> anova(model1, model4, refit = TRUE)
    refitting model(s) with ML (instead of REML)
    Data: problem7
    Models:
    model4: y ∼ 1 + (1 | a) + (1 | a:b)
    model1: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    model4  4 186.86 194.34 -89.430   178.86
    model1  5 169.98 179.34 -79.992   159.98 18.876      1  1.395e-05 ***
    ---------
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

Because the p‐value Pr(>Chisq) = 0.00001395 is smaller than 0.05, the hypothesis is rejected.

In Table 5.7 we have to replace, for the random model with the ANOVA method, the expected mean squares by

6.22

Now we can directly find the ANOVA estimators for the variance components.

Problem 6.9

Derive the estimators of the variance components with the ANOVA method using the REML.

Solution

The algorithm of the analysis of variance method shown in Section 6.2 provides estimators of the variance components of the balanced case

6.23

In R we will use the packages VCA and lme4 to estimate the variance components of model II for the two‐way classification.

We use the data of Problem 6.7 where we already have in R the data‐frame problem7.

 > library(VCA)
> anova8 <- anovaVCA(y∼a + b + a:b,problem7,VarVC.method = c("scm"))
> anova8
     Result Variance Component Analysis:
     ---------------------------------------------------------------------------------------------------------
  
       Name  DF       SS         MS         VC       %Total    SD       CV[%]
     1 total 3.956536                       9.780486 100       3.127377 171.167719
     2 a     3        9.170625   3.056875   0.018981 0.194075  0.137773 7.540614
     3 b     3        306.242292 102.080764 8.270972 84.566065 2.87593  157.405508
     4 a:b   9        25.461875  2.829097   0.669282 6.843038  0.818097 44.776109
     5 error 32       26.28      0.82125    0.82125  8.396822  0.906228 49.599733
  
    Mean: 1.827083 (N = 48)
Experimental Design: balanced  |  Method: ANOVA

We found the following estimates of the variance components

Now we estimate the variance components using the REML method. Because the data of Problem 6.7 are balanced, the REML estimates are the same as the ANOVA estimates, because the ANOVA estimates of the variance components are positive.

 > library(lme4)
> y.lmer <- lmer( y ∼1 + (1|a) + (1|b) + (1|a:b), data= problem7)
> summary(y.lmer)
    Linear mixed model fit by REML ['lmerMod']
    Formula: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b)
       Data: problem7
    REML criterion at convergence: 157.5
    Scaled residuals:
         Min       1Q   Median       3Q      Max
    -2.30598 -0.41655  0.01979  0.51536  1.58012
    Random effects:
     Groups   Name        Variance Std.Dev.
     a:b      (Intercept) 0.66928  0.8181
     b        (Intercept) 8.27097  2.8759
     a        (Intercept) 0.01898  0.1378
     Residual             0.82125  0.9062
    Number of obs: 48, groups:  a:b, 16; b, 4; a, 4
    Fixed effects:
                Estimate Std. Error t value
(Intercept)    1.827      1.460   1.251

6.3.2 Two‐Way Nested Classification

The two‐way nested classification is a special case of the incomplete two‐way cross‐classification, it is maximally unconnected. The formulae for the estimators of the variance components become very simple. We use the notation of Section 5.3.2, but now the a_i and b_j in (5.33) are random variables. The model equation (5.33) then becomes

6.24

with the conditions of uncorrelated a_i, b_ij, and e_ijk and

Further, we assume for tests that the random variables a_i are N(0, ), b_ij are N(0, ), and e_ijk are N(0, σ²).

We use here and in other sections the columns source of variation, sum of squares and degrees of freedom of the corresponding ANOVA tables in Chapter 5 because these columns are the same for models with fixed and random effects. The column expected mean squares and F‐test statistics must be derived for models with random effects anew. We now replace in Table 5.13 the column of E(MS) for the random model (see Table 6.6).

Source of variation	E(MS)
Between A levels
Between B levels within A levels
Within B levels (residual)	σ²

In this table the positive coefficients λ_i are defined by

6.25

Problem 6.10

Derive the ANOVA estimates for all variance components; give also the REML estimates.

Solution

From the analysis of variance method, we obtain the estimators of the variance components by

6.26

with

and

The formulae for the variances of the variance components are given as formulae (6.60) and (6.61) in Rasch and Schott (2018).

Example

For testing the performance of boars, the performance of the offspring of boars under unique feeding, fattening and slaughter are measured. From the results of testing, two boars b₁, b₂ were randomly selected. For each boar, we observed the offspring of several sows. As well as from B₁ and also from B₂ three observations y (number the fattening days from 40 kg up to 110 kg) are available. The variance components for boars and sows (within boars) and within sows must be estimated.

Table 6.7 shows the observations y_ijk. In this case we have a = 2, b₁ = 3, b₂ = 3.

Table 6.7 Data of the example for Problem 6.9.

Number of fattening days y_ijk.	Boars	B₁			B₂
	Sows	S₁₁	S₁₂	S₁₃	S₂₁	S₂₂	S₂₃
Offspring	y_ijk	93 89 97 105	107 99	109 107 94 106	89 102 104 97	87 91 82	81 83 85 91
	n_ij	4	2	4	4	3	4
	n_i.	10			11

Using the R package VCA we find the ANOVA estimates.

 > y1 <- c(93,89,97,105,107,99,109,107,94,106)
> y2 <- c(89,102,104,97,87,91,82,81,83,85,91)
> y <- c(y1,y2)
> b1 <- rep(1,10)
> b2 <- rep(2, 11)
> b <- c(b1,b2)
> s1 <- rep(1,4)
> s2 <- rep(2,2)
> s3 <- rep(3,4)
> s5 <- rep(4,4)
> s4 <- rep(4,4)
> s5 <- rep(5,3)
> s6 <- rep(6,4)
> s <- c(s1,s2,s3,s4,s5,s6)
> problem9 <- data.frame(b,s,y)
> library(VCA)
> anova9 <- anovaVCA(y∼ b+ b:s,problem9,VarVC.method =c("scm"))
> anova9
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------
  Name  DF       SS         MS         VC        %Total    SD       CV[%]
1 total 3.520746                       105.29654 100       10.26141 10.785266
2 b     1        568.535065 568.535065 40.933538 38.874532 6.397932 6.724553
3 b:s   4        531.369697 132.842424 28.318558 26.894101 5.321518 5.593187
4 error 15       540.666667 36.044444  36.044444 34.231366 6.003703 6.310198
Mean: 95.14286 (N = 21)
Experimental Design: unbalanced  |  Method: ANOVA

Now we will use the REML estimates.

 > library(lme4)
> lmer9 <- lmer( y ∼1 + (1|b) + (1|b:s), data= problem9)
> summary(lmer9)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | b) + (1 | b:s)
   Data: problem9
REML criterion at convergence: 139.2
Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.48912 -0.65918  0.00967  0.74445  1.34738
Random effects:
 Groups   Name        Variance Std.Dev.
 b:s      (Intercept) 26.18    5.117
 b        (Intercept) 46.90    6.848
 Residual             35.77    5.980
Number of obs: 21, groups:  b:s, 6; b, 2
Fixed effects:
            Estimate Std. Error t value
(Intercept)    95.39       5.44   17.54

Note: the REML estimates are different from the ANOVA estimates due to the unbalanced data.

Problem 6.11

Test the hypotheses and with α = 0.05 for each hypothesis.

Solution

In R we will use the package lme4 and fit several models to apply the ML test. Because the data are unbalanced, we will not use first the ANOVA table for fixed effects, because there are no exact F‐tests for the hypotheses.

Example

We use the data of Table 6.7 of Problem 6.9. In R we have already made the data‐frame problem9.

 > library(lme4)
> lmer1 <- lmer( y ∼1 + (1|b) + (1|b:s), data= problem9)
> lmer2 <- lmer( y ∼1 + (1|b) , data= problem9)
> lmer3 <- lmer( y ∼1  + (1|b:s), data= problem9)
> # test H0: Variance component of B is zero
> anova(lmer1, lmer2, refit = TRUE)
refitting model(s) with ML (instead of REML)
Data: problem9
Models:
lmer2: y ∼ 1 + (1 | b)
lmer1: y ∼ 1 + (1 | b) + (1 | b:s)
      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
lmer2  3 153.52 156.66 -73.761   147.52
lmer1  4 152.01 156.18 -72.003   144.01 3.5162      1    0.06077 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> # test H0: Variance component of A is zero
> anova(lmer1, lmer3, refit = TRUE)
refitting model(s) with ML (instead of REML)
Data: problem9
Models:
lmer3: y ∼ 1 + (1 | b:s)
lmer1: y ∼ 1 + (1 | b) + (1 | b:s)
      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
lmer3  3 150.60 153.73 -72.300   144.60
lmer1  4 152.01 156.18 -72.003   144.01 0.5931      1     0.4412

Conclusion

Because the p‐values Pr(>Chisq) for both tests are larger than 0.05, the hypotheses and are not rejected.

6.4 Three‐Way Classification

We have four three‐way classifications and proceed in all sections in the same way. At first, we complete the corresponding ANOVA table of Chapter 5 using the column E(MS) for the model with random effects. Then we use the ANOVA method only and use the algorithm of the analysis of variance method in Section 6.4.1 to obtain estimators of the variance components in the balanced case.

6.4.1 Three‐Way Cross‐Classification with Equal Sub‐Class Numbers

We start with the model equation for the balanced model

6.27

with the side conditions that the expectations of all random variables of the right‐hand side of (6.27) are equal to zero and all covariances between different random variables of the right‐hand side of (6.27) vanish: var(a_i) = , var(b_j) = , var(c_k) = , var(a,b)_ij = , var(a,c)_ik = , var(b,c)_jk = , var(a,b,c)_ijk = . Further we assume for tests that the y_ijkl are normally distributed.

Table 6.8 shows the E(MS) for this case as the new addendum of Table 5.15.

Source of variation	E(MS)
Between A levels
Between B levels
Between C levels
Interaction A × B
Interaction A × C
Interaction B × C
Interaction A × B × C
Within the sub‐classes (residual)	σ²

Problem 6.12

Use the analysis of variance method to obtain the estimators for the variance components by solving

Solution

In R we use the package VCA for the ANOVA estimates and package lme4 for the REML estimates.

Example

In a factory, from the population of operators we take a random sample A of three operators; from the population of machines we take a random sample B of three machines; from the production material we have a population of batches, where we take a random sample of C of three batches. From the product produced by operator A_i on machine B_j from the batch C_k we determine a characteristic with value y_ijk. One is interested in the sources of variation in y_ijk. The data in Table 6.9 are from Kuehl, page 212 (1994).

Table 6.9 (Kuehl 1994) Observations of products produced by operator A_i on machine B_j from the batch C_k.

y	A₁			A₂			A₃
	B₁₁	B₁₂	B₁₃	B₂₁	B₂₂	B₂₃	B₃₁	B₃₂	B₃₃
C₁	0.60	1.69	3.47	0.05	0.11	0.06	0.07	0.08	0.22
	0.48	2.01	3.30	0.12	0.09	0.19	0.06	0.14	0.17
C₂	0.98	2.21	5.68	0.15	0.23	0.40	0.07	0.23	0.43
	0.93	2.48	5.11	0.26	0.35	0.75	0.21	0.35	0.35
C₃	1.37	3.31	5.74	0.72	0.78	2.10	0.40	0.72	1.95
	1.50	2.84	5.38	0.51	1.11	1.18	0.57	0.88	2.87

In R we use for the ANOVA estimates of the variance components the package VCA and for the REML estimates the package lme4.

 > y1 <- c(0.60,0.48,0.98,0.93,1.37,1.50)
> y2 <- c(1.69,2.01,2.21,2.48,3.31,2.84)
> y3 <- c(3.47,3.30,5.68,5.11,5.74,5.38)
> y4 <- c(0.05,0.12,0.15,0.26,0.72,0.51)
> y5 <- c(0.11,0.09,0.23,0.35,0.78,1.11)
> y6 <- c(0.06,0.19,0.40,0.75,2.10,1.18)
> y7 <- c(0.07,0.06,0.07,0.21,0.40,0.57)
> y8 <- c(0.08,0.14,0.23,0.35,0.72,0.88)
> y9 <- c(0.22,0.17,0.43,0.35,1.95,2.87)
> y <- c(y1,y2,y3,y4,y5,y6,y7,y8,y9)
> a <- rep(1:3, each = 18)
> b1 <- rep(1:3, each =6)
> b <- c(b1,b1,b1)
> c1 <- rep(1:3, each = 2)
> c <- c(c1,c1,c1,c1,c1,c1,c1,c1,c1)
> problem11 <- data.frame(a,b,c,y)
> library(VCA)
> anova11<-anovaVCA(y∼a+b+c+a:b+a:c+b:c+a:b:c,problem11, VarVC.method=   c("scm"))
> anova11
  Result Variance Component Analysis:
  ---------------------------------------------------------------------------------------------------------
  
    Name  DF       SS        MS        VC       %Total    SD       CV[%]     
  1 total 5.818679                     3.099903 100       1.760654 139.796089
  2 a     2        58.134344 29.067172 1.322089 42.649381 1.149821 91.295924 
  3 b     2        26.2828   13.1414   0.41299  13.32267  0.642643 51.025898 
  4 c     2        12.460844 6.230422  0.29758  9.599644  0.545509 43.313431 
  5 a:b   4        20.618956 5.154739  0.824737 26.60524  0.90815  72.107197 
  6 a:c   4        1.284578  0.321144  0.019138 0.617358  0.138338 10.984077 
  7 b:c   4        3.036656  0.759164  0.092141 2.972375  0.303547 24.101653 
  8 a:b:c 8        1.650556  0.206319  0.07509  2.422343  0.274026 21.757693 
  9 error 27       1.51575   0.056139  0.056139 1.810989  0.236936 18.812776 
  
  Mean: 1.259444 (N = 54) 
Experimental Design: balanced  |  Method: ANOVA

> library(lme4)
> lmer11 <- lmer(y∼1+(1|a)+(1|b)+(1|c)+(1|a:b)+(1|a:c)+ (1|b:c)+(1|a:b:c) , data= problem11)
> summary(lmer11)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | a) + (1 | b) + (1 | c) + (1 | a:b) + (1 | a:c) +  
    +(1 | b:c) + (1 | a:b:c)
   Data: problem11
REML criterion at convergence: 80.5
Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.96769 -0.42854 -0.02282  0.27936  2.50366 
Random effects:
 Groups   Name        Variance Std.Dev.
 a:b:c    (Intercept) 0.07509  0.2740  
 b:c      (Intercept) 0.09214  0.3035  
 a:c      (Intercept) 0.01914  0.1383  
 a:b      (Intercept) 0.82474  0.9082  
 c        (Intercept) 0.29758  0.5455  
 b        (Intercept) 0.41299  0.6426  
 a        (Intercept) 1.32209  1.1498  
 Residual             0.05614  0.2369  
Number of obs: 54, groups:  a:b:c, 27; b:c, 9; a:c, 9; a:b, 9; c, 3; b, 3; a, 3
Fixed effects:
            Estimate Std. Error t value
(Intercept)   1.2594     0.8862   1.421

If model equation 6.27), including its conditions about expectations and covariances of the components of y_ijkl, is valid and y_ijkl is multivariate normally distributed with the marginal distributions

then the are CS(df_X)‐distributed (X = A, B, C, AB, AC, BC, ABC) with SS_X, and df_X from Table 5.15 and E(MS_X) from Table 6.8.

Problem 6.13

Derive the F‐tests with significance level α = 0.05 for testing each of the null hypotheses .

Solution

The formulae can be found in Table 6.10.

Example

We have already made the data‐frame problem11. We first make the ANOVA table for fixed effects.

 > A <- factor(a)
> B <- factor(b)
> C <- factor(c)
> fixedanova <- aov( y ∼ A+B+C+A:B+A:C+B:C+A:B:C, problem11)
> summary(fixedanova)
            Df Sum Sq Mean Sq F value   Pr(>F)
A            2  58.13  29.067 517.772  < 2e-16 ***
B            2  26.28  13.141 234.087  < 2e-16 ***
C            2  12.46   6.230 110.982 9.45e-14 ***
A:B          4  20.62   5.155  91.821 2.59e-15 ***
A:C          4   1.28   0.321   5.721  0.00181 **
B:C          4   3.04   0.759  13.523 3.57e-06 ***
A:B:C        8   1.65   0.206   3.675  0.00510 **
Residuals   27   1.52   0.056
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> # test for H0 variance component AB is 0
> FAB <- 5.155/0.206
> FAB
[1] 25.02427
> p_valueAB <- 1-pf(FAB,4,8)
> p_valueAB
[1] 0.0001411135
> # test for H0 variance component AC is 0
> FAC <- 0.321/0.206
> FAC
[1] 1.558252
> p_valueAC <- 1-pf(FAC,4,8)
> p_valueAC
[1] 0.274647
> # test for H0 variance component BC is 0
> FBC <- 0.759/0.206
> FBC
[1] 3.684466
> p_valueBC <- 1-pf(FBC,4,8)
> p_valueBC
[1] 0.05505248
> # test for H0 variance component ABC is 0
> FABC <- 0.206/0.056
> FABC
[1] 3.678571
> p_valueABC <-1-pf(FABC,8,27)
> p_valueABC
[1] 0.005070746
> # Only the p_valueABC is correct found in the fixed ANOVA table.

Conclusion

The hypotheses , are not rejected.

Problem 6.14

Derive the approximate F‐tests with significance level α = 0.05 for testing each of the null hypotheses .

Solution

For testing the null hypothesis we need a result from Satterthwaite (1946) because E(MS_X) (X = A, B, C) under the null hypothesis is not equal to any E(MS) in other rows of the ANOVA table. Therefore, we use instead linear combinations of the E(MS) so that E(MS_X) under H_0X equals the E(MS) of this linear combination. We find

We use this to construct test statistics for the null hypotheses which are approximately F‐distributed.

These test statistics for approximate F‐tests are:

and this is, under H_A0, approximately F(a − 1, r_a)‐distributed;

and this is under H_B0, approximately F(b − 1, r_b)‐distributed and

and this is, under H_C0, approximately F(c − 1, r_c)‐distributed.

The approximate F‐test has, respectively, denominator degrees of freedom r_a, r_b, r_c, which are determined by the Satterthwaite's method. Let ν_i be the degrees of freedom of MS_i then the degree of freedom r of (MS₁ + MS₂ − MS₃) is approximated by:

See also Gaylor and Hopper (1969) about estimating the df of linear combinations of mean squares by Satterthwaite's formula.

Example

In the output of Problem 6.14 we have already found the ANOVA table for the fixed effects model.

 > # test for H0 variance component A is 0
> denominatorA <- 5.155 + 0.321 - 0.206
> FA <- 29.067/denominatorA
> FA
[1] 5.51556
> rA <- (denominatorA^2)/(5.155^2/4 + 0.321^2/4 +0.206^2/8)
> rA
[1] 4.161002
> p_valueA <- 1-pf(FA, 2, rA)
> p_valueA
[1] 0.06759019
> # test for H0 variance component B is 0
> denominatorB <- 5.155 + 0.759 - 0.206
> FB <- 13.141/denominatorB
> FB
[1] 2.302207
> rB <- (denominatorB^2)/(5.155^2/4 + 0.759^2/4 + 0.206^2/8)
> rB
[1] 4.796419
> p_valueB <- 1-pf(FB, 2, rB)
> p_valueB
[1] 0.1991247
> # test for H0 variance component C is 0
> denominatorC <- 0.759 + 0.321 - 0.206
> FC <- 6.230/denominatorC
> FC
[1] 7.128146
> rC <- (denominatorC^2)/(0.759^2/4 + 0.321^2/4 + 0.206^2/8)
> rC
[1] 4.362887
> p_valueC <- 1-pf(FC, 2, rC)
> p_valueC
[1] 0.0421972

Only the hypothesis for is rejected.

Remark

To do the LR test we proceed with the R package lme4 and start with the largest model with all the random variables, like we have done in Problem 6.12. Then we can test a certain variance component by fitting the model without this variance component. We demonstrate this for the test of H₀: = 0.

 > library(lme4)
> lmer121 <- lmer(y∼1+(1|a)+(1|b)+(1|c)+(1|a:b)+(1|a:c)+ (1|b:c)+(1|a:b:c) , data= problem11)
> lmer122 <- lmer(y∼1+(1|a)+(1|b)+(1|c)+(1|a:b)+(1|a:c)+ (1|b:c) , data= problem11)
> # Test H0 the variance component of ABC is 0
> anova(lmer121, lmer122, refit = TRUE)
    refitting model(s) with ML (instead of REML)
    Data: problem11
    Models:
    lmer122: y ∼ 1 + (1 | a) + (1 | b) + (1 | c) + (1 | a:b) + (1 | a:c) +
    lmer122:     (1 | b:c)
    lmer121: y ∼ 1 + (1 | a) + (1 | b) + (1 | c) + (1 | a:b) + (1 | a:c) +
    lmer121:     (1 | b:c) + (1 | a:b:c)
            Df     AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
    lmer122  8 104.218 120.13 -44.109   88.218
    lmer121  9  99.934 117.83 -40.967   81.934 6.2837      1    0.01219 *
    ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

Because the p‐value Pr(>Chisq) = 0.01219 is smaller than 0.05, the hypothesis H₀: = 0 is rejected.

6.4.2 Three‐Way Nested Classification

For the three‐way nested classification C ≺ B ≺ A we assume the following model equation

6.28

The conditions are: all random variables of the right‐hand side of (6.28) have expectation 0 and are pairwise uncorrelated and for all i, for all i.j, for all i, j, k and var(e_ijkl) = σ² for all i, j, k, l.

We find the SS, df, and MS of the three‐way nested analysis of variance in Table 5.18.

The E(MS) for the random model can be found in Table 6.11.

Source of variation	E(MS)
Between the A levels
Between the B levels within the A levels
Between the C levels within the B and A levels
Residual	σ²

In Table 6.11 we use the abbreviations:

Problem 6.15

Derive by the analysis of variance method the ANOVA estimates of the variance components in Table 6.11 and also the REML estimates.

Solution

For the ANOVA estimates we use the R package VCA and for the REML estimates we use the R package lme4.

Example

In an experiment we have the random factor A with two classes; within the levels A_i we have the nested random factor B with two classes and within levels B_j we have the nested random factor C with respectively two and three classes. The observations y_ijkl are shown in Table 6.12.

Table 6.12 Observations y_ijkl of a three‐way nested classification model II.

			y
A₁	B₁₁	C₁₁₁	1	2
		C₁₁₂	4
	B₁₂	C₁₂₁	3	3.5
		C₁₂₂	4	3.5
		C₁₂₃	4.5
A₂	B₂₁	C₂₁₁	5	7
		C₂₁₂	8
	B₂₂	C₂₂₁	6	7
		C₂₂₂	8.5	9
		C₂₂₃	10

For the data of such an experiment we first make in R the data‐frame problem13.

 > y <-  c(1, 2, 4, 3, 3.5, 4, 3.5, 4.5, 5, 7, 8, 6, 7, 8.5, 9, 10)
> a <-  rep(1:2, each = 8)
> b1 <-  rep(1:2, times = c(3,5))
> b <- c(b1,b1)
> c <- c(1, 1, 2, 1, 1, 2, 2, 3, 1, 1, 2, 1, 1, 2, 2, 3)
> problem13 <- data.frame(a,b,c,y)
> library(VCA)
> anova13 <- anovaVCA(y∼ a+ a:b+b:c,problem13,VarVC.method =c("scm"))
> anova13
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------
  Name  DF       SS       MS       VC        %Total    SD       CV[%]
1 total 1.470778                   11.649835 100       3.413186 63.501126
2 a     1        76.5625  76.5625  9.41335   80.802432 3.068118 57.081272
3 a:b   2        7.354167 3.677083 0.145241  1.246718  0.381104 7.090314
4 b:c   3        15.0875  5.029167 1.452819  12.470722 1.205329 22.42473
5 error 9        5.745833 0.638426 0.638426  5.480128  0.799016 14.865406

Mean: 5.375 (N = 16)
Experimental Design: unbalanced  |  Method: ANOVA
> library(lme4)
>lmer13 <- lmer( y ∼1+(1|a)+(1|a:b)+(1|b:c), data= problem13)
> summary(lmer13)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | a) + (1 | a:b) + (1 | b:c)
   Data: problem13
REML criterion at convergence: 50.9
Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.45734 -0.49905  0.07056  0.47646  1.22502
Random effects:
 Groups   Name        Variance Std.Dev.
 b:c      (Intercept) 1.6184   1.272
 a:b      (Intercept) 0.0000   0.000
 a        (Intercept) 9.4985   3.082
 Residual             0.5746   0.758
Number of obs: 16, groups:  b:c, 5; a:b, 4; a, 2
Fixed effects:
            Estimate Std. Error t value
(Intercept)    5.594      2.261   2.474

Hence the estimates of the variance components are and s² = 0.57.

6.4.3 Three‐Way Mixed Classifications

We consider the mixed classifications in Sections 5.5.3.1 and 5.5.3.2 but assume now random effects.

6.4.3.1 Cross‐Classification Between Two Factors Where One of Them is Sub‐Ordinated to a Third Factor ((B ≺ A)×C)

If in a balanced experiment a factor B is sub‐ordinated to a factor A and both are cross‐classified with a factor C then the corresponding model equation is given by

6.29

where μ is the general experimental mean, a_i is the random effect of the ith level of factor A with E(a_i) = 0, var(a_i) = . Further, b_ij is the random effect of the jth level of factor B within the ith level of factor A, with E(b_ij) = 0, var(b_ij) = and c_k is the random effect of the kth level of factor C, with E(c_k) = 0, var(c_k) = . Further, (a,c)_ik and(b,c)_jk(i) are the corresponding random interaction effects with E((a,c)_ik) = 0, var((a,c)_ik) = and E((b,c)_jk(i)) = 0, var ((b,c)_jk(i)) = . e_ijkl is the random error term with E(e_ijkl) = 0, var (e_ijkl) = σ². All the right‐hand side random effects are uncorrelated with each other.

The ANOVA table with df and expected mean squares E(MS) is as follows, (N = abcn).

Problem 6.16

Derive by the analysis of variance method the ANOVA estimates of the variance components in Table 6.13 and also the REML estimates.

Table 6.13 ANOVA table with df and expected mean squares E(MS) of model (6.29).

Source of variation	df	E(MS)
Between the levels of A	a − 1	bcn + cn + bn + n + σ²
Between the levels of B within the levels of A	a(b − 1)	cn + n + σ²
Between the levels of C	c − 1	abn + bn + n + σ²
Interaction A × C	(a − 1)(c − 1)	bn + n + σ²
Interaction B × C within the levels of A	a(b − 1)(c − 1)	n + σ²
Residual	N − abc	σ²
Corrected total	N − 1

Solution

For the ANOVA estimates of the variance components, we use the R package VCA. For the REML estimates, we use the R package lme4.

Example

The factor A is random with three classes, the factor B within the levels of A is random with two levels, the factor C is random with three classes. The data are y and given in Table 6.14.

Table 6.14 Data in a three‐way mixed classification ((B ≺ A)xC) model II.

y	A₁		A₂		A₃
	B₁₁	B₁₂	B₂₁	B₂₂	B₃₁	B₃₂
C₁	141.2	91.2	189.8	191.5	141.9	145.5
	142.6	91.4	190.3	193.0	142.7	144.7
C₂	135.7	143.0	132.4	134.4	137.4	141.1
	136.8	143.3	130.3	130.0	135.2	139.1
C₃	163.2	181.4	173.6	174.9	166.6	175.0
	163.3	180.3	173.9	175.6	165.5	172.0

 > y1 <-  c(141.2, 142.6, 135.7, 136.8, 163.2, 163.3)
> y2 <-  c(51.2, 51.4, 143.0, 143.3, 181.4, 180.3)
> y3 <-  c(189.8, 190.3, 132.4, 130.3, 173.6, 173.9)
> y4 <-  c(191.5, 193.0, 134.4, 130.0, 174.9, 175.6)
> y5 <-  c(141.9, 142.7, 137.4, 135.2, 166.6, 165.5)
> y6 <-  c(145.5, 144.7, 141.1, 139.1, 175.0, 172.0)
>  a  <-  rep(1 : 3, each = 12)
>  b  <-  rep(c(11,12,21,22,31,32) , each = 6)
> c1 <- rep(1:3, each = 2)
> y  <- c(y1, y2, y3, y4, y5, y6)
> c  <- c(c1, c1, c1, c1, c1,c1)
> problem14 <- data.frame(a, b, c, y)
> library(VCA)
> anova14 <- anovaVCA(y∼ a + a/b + c + a:c + b:c ,problem14,      VarVC.method =c("scm"))
> anova14
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------
  Name  DF        SS           MS          VC          %Total    SD        CV[%]
1 total 11.119385                          1193.339549 100       34.544747 22.903438
2 a     2         5290.877222  2645.438611 10.75066    0.900889  3.27882   2.173883
3 a:b   3         1529.105     509.701667  0*          0*        0*        0*
4 c     2         8467.617222  4233.808611 86.670243   7.262832  9.309685  6.172394
5 a:c   4         12775.062778 3193.765694 501.682257  42.040194 22.398265 14.850225
6 b:c   6         7122.22      1187.036667 592.800278  49.675742 24.34749  16.142577
7 error 18        25.85        1.436111    1.436111    0.120344  1.198379  0.794534
Mean: 150.8278 (N = 36)
Experimental Design: balanced  |  Method: ANOVA | * VC set to 0 | adapted MS used for total DF
>library(lme4)
> lmer14 <- lmer(y∼1+(1|a)+(1|a:b)+(1|c)+(1|a:c)+(1|b:c),problem14)
> summary(lmer14)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | a) + (1 | a:b) + (1 | c) + (1 | a:c) + (1 | b:c)
   Data: problem14

REML criterion at convergence: 236
Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.84123 -0.44611 -0.02025  0.45375  1.83040
Random effects:
 Groups   Name        Variance  Std.Dev.
 b:c      (Intercept) 4.799e+02 2.191e+01
 a:c      (Intercept) 5.124e+02 2.264e+01
 a:b      (Intercept) 1.329e-15 3.646e-08
 c        (Intercept) 1.019e+02 1.009e+01
 a        (Intercept) 8.109e-13 9.005e-07
 Residual             1.436e+00 1.198e+00
Number of obs: 36, groups:  b:c, 18; a:c, 9; a:b, 6; c, 3; a, 3
Fixed effects:
            Estimate Std. Error t value
(Intercept)   150.83      10.84   13.91

Hence the estimates of the variance components are ; s²_bc = 479.9 and s² = 1.44.

6.4.3.2 Cross‐Classification of Two Factors in Which a Third Factor is Nested (C ≺ (A×B))

The model equation for this type is

6.30

where μ is the general experimental mean, a_i is the effect of the ith level of factor A with E(a_i) = 0, var(a_i) = . Further, b_j is the effect of the jth level of factor B with E(b_j) = 0, var (b_j) = ; c_ijk is the effect of the kth level of factor C within the combinations of A × B, with E(c_ijk) = 0, var (c_ijk) = . Further (a,b)_ij is the corresponding random interaction effect with E((a,b)_ij) = 0, var ((a,b)_ij) = and e_ijkl is the random error term with E(e_ijkl) = 0, var (e_ijkl) = σ². The right‐hand side random effects are uncorrelated with each other.

The ANOVA table with df and expected mean squares E(MS) is as follows, (N = abcn).

Problem 6.17

Derive by the analysis of variance method the ANOVA estimates of the variance components in Table 6.15 and also the REML estimates.

Table 6.15 ANOVA table with df and expected mean squares E(MS) of model (6.30).

Source of variation	df	E(MS)
Between A levels	a − 1	bcn + n + + σ²
Between B levels	b − 1	acn + n + + σ²
Between C levels in A × B combinations	ab(c − 1)	cn + σ²
Interaction A × B	(a − 1)(b − 1)	n + cn + σ²
Residual	N − abc	σ²

Solution

For the ANOVA estimates of the variance components we use the R package VCA. For the REML estimates we use the R package lme4.

Example

The factor A is random with three classes, the factor B is random with two classes, the factor C within the levels of A × B is random with two levels. The data are y and comes from Kuehl, page 254 (1994), see Table 6.16.

Table 6.16 Data in a three‐way mixed classification C ≺ (AxB) model II.

y	A₁		A₂		A₃
	C₁₁₁	C₁₁₂	C₁₂₁	C₁₂₂	C₁₃₁	C₁₃₂
B₁	3.833	3.819	3.756	3.882	3.720	3.729
	3.866	3.853	3.757	3.871	3.720	3.768
	C₂₁₁	C₂₁₂	C₂₂₁	C₂₂₂	C₂₃₁	C₂₃₂
B₂	3.932	3.884	3.832	3.917	3.776	3.833
	3.943	3.888	3.829	3.915	3.777	3.827

 > y1 <- c( 3.833, 3.866, 3.932, 3.943)
> y2  <- c( 3.819, 3.853, 3.884, 3.888)
> y3 <-  c( 3.756, 3.757, 3.832, 3.829)
> y4 <-  c(  3.882, 3.871, 3.917, 3.915)
> y5 <-  c( 3.720, 3.720, 3.776, 3.777)
> y6 <-  c( 3.729, 3.768, 3.833, 3.827)
> y <- c(y1,y2,y3,y4,y5,y6)
> a <- rep(1 : 3, each = 8)
> b1 <- rep(1 : 2, each = 2)
> b <- c(b1,b1,b1,b1,b1,b1)
> c<- rep(1 : 12, each = 2)
> problem15 <- data.frame(a,b,c,y)
> library(VCA)
> anova15 <- anovaVCA(y∼ a+b+(a:b)/c,problem15, VarVC.method =c("scm"))
> anova15
Result Variance Component Analysis:
---------------------------------------------------------------------------------------------------------
  Name  DF       SS       MS       VC       %Total    SD       CV[%]
1 total 4.399143                   0.007625 100       0.087323 2.279801
2 a     2        0.049641 0.024821 0.00309  40.523749 0.055588 1.451282
3 b     1        0.025285 0.025285 0.002099 27.523462 0.045812 1.196048
4 a:b   2        2e-04    1e-04    0*       0*        0*       0*
5 a:b:c 6        0.028219 0.004703 0.002267 29.726104 0.04761  1.242985
6 error 12       0.002038 0.00017  0.00017  2.226685  0.01303  0.340194
Mean: 3.830292 (N = 24)
Experimental Design: balanced  |  Method: ANOVA | * VC set to 0 | adapted MS used for total DF.
> library(lme4)
>lmer15 <- lmer(y∼1+(1|a)+(1|b)+(1|a:b)+(1|(a:b):c),problem15)
> summary(lmer15)
Linear mixed model fit by REML ['lmerMod']
Formula: y ∼ 1 + (1 | a) + (1 | b) + (1 | a:b) + (1 | (a:b):c)
   Data: problem15
REML criterion at convergence: -91.9
Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.5008 -0.2124 -0.1088  0.2623  1.4922
Random effects:
 Groups   Name        Variance  Std.Dev.
 (a:b):c  (Intercept) 0.0016913 0.04113
 a:b      (Intercept) 0.0000000 0.00000
 a        (Intercept) 0.0026585 0.05156
 b        (Intercept) 0.0018111 0.04256
 Residual             0.0001698 0.01303
Number of obs: 24, groups:  (a:b):c, 12; a:b, 6; a, 3; b, 2
Fixed effects:
            Estimate Std. Error t value
(Intercept)  3.83029    0.04404   86.97.

Hence the estimates of the variance components are and s² = 0.0002.

Finally, a general remark on the examples for estimating variance components. Usually, a huge number of the corresponding factor levels is needed and often available. For demonstrating the R‐programs, we used smaller data sets to show how to proceed.

References

Anderson, R.L. and Bancroft, T.A. (1952). Statistical Theory in Research. New York: McGraw‐Hill.
Federer, W.T. (1968). Non‐negative estimators for components of variance. Appl. Stat. 17: 171–174.
Gaylor, D.W. and Hopper, F.N. (1969). Estimating the degree of freedom for linear combinations of mean squares by Satterthwaite's formula. Technometrics 11: 691–706.
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (1995). Bayesian Data Analysis. New York: Chapman and Hall.
Hammersley, J.M. (1949). The unbiased estimate and standard error of the interclass variance. Metron 15: 189–204.
Hartley, H.O. (1967). Expectations, variances and covariances of ANOVA mean squares by “synthesis”. Biometrics 23: 105–114.
Harville, D.A. (1977). Maximum‐likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72: 320–340.
Henderson, C.R. (1953). Estimation of variance and covariance components. Biometrics 9: 226–252.
Klotz, J.H., Milton, R.C., and Zacks, S. (1969). Mean square efficiency of estimators of variance components. J. Am. Stat. Assoc. 64: 1383–1402.
Kuehl, R.O. (1994). Statistical Principles of Research Design and Analysis. Belmont, California: Duxbury Press.
Patterson, H.D. and Thompson, R. (1971). Recovery of inter‐block information when block sizes are unequal. Biometrika 58: 545–554.
Rasch, D. and Schott, D. (2018). Mathematical Statistics. Oxford: Wiley.
Sarhai, H. and Ojeda, M.M. (2004). Analysis of Variance for Random Models, Balanced Data. Basel/Berlin: Birkhäuser, Boston.
Sarhai, H. and Ojeda, M.M. (2005). Analysis of Variance for Random Models, Unbalanced Data. Basel/Berlin: Birkhäuser, Boston.
Satterthwaite, F.E. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2: 110–114.
Searle, S.R., Casella, G., and McCulloch, C.E. (1992). Variance Components. New York/Chichester/Brisbane/Toronto/Singapore: Wiley.
Seely, J.F. and Lee, Y. (1994). A note on Satterthwaite confidence interval for a variance. Commun. Stat. 23: 859–869.
Thompson, W.A. Jr. (1962). Negative estimates of variance components. Ann. Math. Stat. 33: 273–289.
Tiao, G.C. and Tan, W.Y. (1965). Bayesian analysis of random effects models in the analysis of variance I: posterior distribution of variance components. Biometrika 52: 35–53.
Townsend, E.C. (1968) Unbiased estimators of variance components in simple unbalanced designs. PhD thesis, Cornell Univ., Ithaca, USA.
Verdooren, L.R. (1980). On estimation of variance components. Stat. Neerl. 34: 83–106.
Verdooren, L.R. (1982). How large is the probability for the estimate of variance components to be negative? Biom. J. 24: 339–360.
Williams, J.S. (1962). A confidence interval for variance components. Biometrika 49: 278–281.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Test statistic	H₀	Distribution of the test statistic under H₀
		F[(a − 1)(b − 1), (a − 1)(b − 1)(c − 1)]
		F[(a − 1)(c − 1), (a − 1)(b − 1)(c − 1)]
		F[(b − 1)(c − 1), (a − 1)(b − 1)(c − 1)]
		F[(a − 1)(b − 1)(c − 1), N − abc]

Table of Contents for 6 Analysis of Variance – Models with Random Effects

Create new playlist

Sign In

Sign Up

6.1 Introduction

6.2 One‐Way Classification

6.2.1 Estimation of the Variance Components

6.2.1.1 ANOVA Method

6.2.1.2 Maximum Likelihood Method

6.2.1.3 REML – Estimation

6.2.2 Tests of Hypotheses and Confidence Intervals

6.2.3 Expectation and Variances of the ANOVA Estimators

6.3 Two‐Way Classification

6.3.1 Two‐Way Cross Classification

6.3.2 Two‐Way Nested Classification

6.4 Three‐Way Classification

6.4.1 Three‐Way Cross‐Classification with Equal Sub‐Class Numbers

6.4.2 Three‐Way Nested Classification

6.4.3 Three‐Way Mixed Classifications

6.4.3.1 Cross‐Classification Between Two Factors Where One of Them is Sub‐Ordinated to a Third Factor ((B ≺ A)×C)

6.4.3.2 Cross‐Classification of Two Factors in Which a Third Factor is Nested (C ≺ (A×B))

References

Table of Contents for
6 Analysis of Variance – Models with Random Effects