Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5
Analysis of Variance (ANOVA) – Fixed Effects Models

5.1 Introduction

In the analysis of variance, we assume that parameters of random variables depend on non‐random variables, called factors. The values a factor can take we call factor levels or in short levels. We discuss cases where one, two or three factors have an influence on the observations.

An experimenter often has to find out in an experiment whether different values of several variables or of several factors have different results on the experimental material. If the effects of several factors have to be examined, the conventional method means to vary only one of these factors at once and to keep all other factors constant. To investigate the effect of p factors this way, p experiments have to be conducted. It can be that the results at the levels of a factor investigated depend on the constant levels of other factors, which means that interactions between the factors exist. The British statistician R. A. Fisher recommended experimental designs by varying the levels of all factors at the same time. For the statistical analysis of the experimental results of such designs (they are called factorial experiments), Fisher developed a statistical procedure: the analysis of variance. The first publication about this topic stemmed from Fisher and Mackenzie (1923), a paper about the analysis of field trials in Fisher's workplace at Rothamsted Experimental Station in Harpenden, UK. A good overview is given in Scheffé (1959) and in Rasch and Schott (2018).

The analysis of variance is based on the decomposition of the sum of squared deviations of the observations from the total mean of the experiment into components. Each of the components is assigned to a specific factor or to interactions of factors or to the experimental error. Further, a corresponding decomposition of the degrees of freedom belonging to sums of squared deviations is done. The analysis of variance is mainly used to estimate the effects of factor levels or to test statistical hypotheses (model I in this chapter), or to estimate components of variance that can be assigned to the different factors (model II – see Chapter 6).

The analysis of variance can be applied to several problems based on mathematical models called model I, model II and the mixed model, respectively. The problem leading to model I is as follows: all factor levels have been particularly selected and involved in the experiment because just these levels are of practical interest. The objective of the experiment is to find out whether the effects of the different levels (or factor level combinations) differ significantly or randomly from each other. The experimental question can be answered by a statistical test if particular assumptions are fulfilled. The statistical conclusion refers to (finite) factor levels specifically selected.

In this chapter, problems in a model I are discussed and these are the estimation of the effects and the interaction effects of the several factors and testing the significance of these effects.

We also show how to determine the optimal size of an experiment. For all these cases we assume that we have to plan an experiment with a type I risk α = 0.05 and a power 1 − ß = 0.95.

5.1.1 Remarks about Program Packages

For the analysis, we can use the program package R as we have downloaded it. Those who like to analyse data by IBM‐SPSS Statistics find programs in Rasch and Schott (2018) and who prefer SAS can find corresponding programs in Rasch et al. (2008). For experimental designs, we use in R the command

 > install.packages("OPDOE")

and

 > library(OPDOE)

OPDOE stands for ‘optimal design of experiments’ and was used in Chapter 3. The R syntax for calculating the sample size for analysis of variance (or for short ANOVA) can be found by >size.anova; a description of how to use OPDOE is found by

 > help(size.anova)

Detailed instructions and examples are given in chapter 3 of Rasch et al. (2011).

5.2 Planning the Size of an Experiment

For planning the size of a balanced experiment, this means equal sample sizes for the effects, precision requirements are needed, as in Chapter 3. The following approach is valid for all sections of this chapter. Besides the precision requirement of the two risks α and β (or the power of the F‐test 1 − β) the non‐centrality parameter λ of the non‐central F‐distribution with f₁ and f₂ degrees of freedom has to be given in advance. With the (1 − α)‐ and the β‐quantile of the non‐central F‐distribution with f₁ and f₂ degrees of freedom and the non‐centrality parameter λ we have to solve the equation

5.1

This equation plays an important role in all sections of this chapter. In addition to f₁, f₂, α, and β the difference δ between the largest and the smallest effect (main effect or in the following sections also interaction effect), to be tested against null, belongs to the precision requirement. We denote the solution λ in 5.1 by

Let E_min, E_max be the minimum and the maximum of q effects E₁, E₂, … , E_q of a fixed factor E or an interaction, respectively. Usually we standardise the precision requirement by the relative precision requirement .

If E_max − E_min ≥ δ then for the non‐centrality parameter of the F‐distribution (for even q) with holds

From this it follows

5.2

The minimal size of the experiment needed depends on λ accordingly to the exact position of all q effects. However, this is unknown when the experiment starts. We consider two extreme cases, the most favourable (resulting in the smallest minimal size n_min) and the least favourable (resulting in the largest minimal size n_max). The least favourable case leads to the smallest non‐centrality parameter λ_min and by this to the so‐called maximin size n_max. This occurs if the q − 2 non‐extreme effects equal . For this is shown in the following scheme.

Schematic illustration depicting the least favorable case of an experiment that leads to the smallest non-centrality parameter to the so-called maximin size.

The most favourable case leads to the largest non‐centrality parameter λ_max and by this to the so‐called minimin size n_min. If q = 2m (even) this is the case, if m of the E_i equal E_min and the m other E_i equal E_max. If q = 2m + 1 (odd) again m of the E_i should equal E_min and m other E_i should equal E_max, and the remaining effect should be equal to one of the two extremes E_min or E_max. For this is shown in the following scheme for even q.

Schematic illustration depicting the most favorable case of an experiment that leads to the largest non-centrality parameter to the so-called minimum size.

When we plan an experiment, we always obtain equal sub‐class numbers. Therefore, we use models and ANOVA tables mainly for the equal subclass number case because we then have simpler formulae for the expected mean squares. In the analysis programs, unequal sub‐class numbers are also possible.

In Section 5.3 we give a theoretical ANOVA – table (for random variables) with expected mean squares E(MS). To find a proper F‐statistic for testing a null hypothesis corresponding to a fixed row in the table we proceed as follows. If the null hypothesis is correct the numerator and denominator of F have the same expectation. In general it is a ratio of two MS of a particular null hypothesis with the corresponding degrees of freedom. This ratio is centrally F‐distributed if the numerator and the denominator in the case that the hypothesis is valid have the same expectation. This equality is, however, not sufficient if unequal subclass numbers occur, for instance it is not sufficient if the MS are not independent of each other. In this case, we obtain only a test statistic that is approximately F‐distributed. Such cases occur in Chapter 6. We write τ = δ/σ.

5.3 One‐Way Analysis of Variance

In this section, we investigate the effects of one factor.

From a populations or universes, G₁, … , G_a random samples Y₁, … , Y_a of size n₁, … , n_a, respectively are drawn independently of each other. We write . The y_i are assumed to be distributed in the populations G_i as with {μ_i} = (μ_i, ⋯, μ_i)^T. Further we write μ_i = μ + a_i (i = 1, … , a). Then we have one factor A with the factor levels A_i; i = 1,…, a and write

5.3

We call μ the total mean and a_i the effect of the ith level of factor A. The total size of the experiment is .

In Table 5.1, we find the scheme of the observations of an experiment with a levels A₁, ⋯, A_a of factor A and n_i observations for the ith level A_i of A. We use Equation 5.3 with the side conditions

Table 5.1 Observations y_ij of an experiment with a levels of a factor A.

	1	2	…	a
y_ij	y₁₁ y₁₂ ⋮	y₂₁ y₂₂ ⋮	⋯ ⋯ ⋮ ⋯	y_a1 y_a2 ⋮
n_i Y_i.	n₁ Y_1.	n₂ Y_2.	… …	n_a Y_a.

For testing the hypothesis, e_ij and by this also y_ij is assumed to be normally distributed.

For testing hypotheses about μ + a_i further assumptions are not needed but to test hypotheses about a_i we need a so‐called reparametrisation condition like or ; both are equivalent if all n_i = n and this we call the balanced case.

In this chapter, we use the point convention for writing sums. In the one‐way case discussed in this section we have

and

The arithmetic means are and

Estimators for for μ in the model 5.3 are given by

5.4

5.5

if we assume and by

5.6

5.7

if we assume . For estimable functions, we drop the left subscripts and write the symbols as in 5.3.

The n_i are called sub‐class numbers. Both estimators are identical in the balanced case if n_i = n (i = 1, … , a).

The reader may ask which reparametrisation condition he should use. There is no general answer. Besides the two forms above, many others are possible. However, fortunately many of the results derived below are independent of the side condition chosen. Often estimates of the a_i are less interesting than those for estimable functions of the parameters such as μ + a_i and a_i − a_j and these estimable functions give the same answer for all side conditions.

The variance σ² in both cases is unbiasedly estimated by

5.8

Table 5.2 gives the ANOVA table for model 5.3. In this table SS means sum of squares, MS means mean squares and df means degrees of freedom. We call this table a theoretical ANOVA table because we write the entries as random variables that are functions of the underlying random samples Y₁, … , Y_a. If we have observed data as realisations of the random samples then the column E(MS) is dropped and nothing in the table is in bold print.

images — Table 5.2 Theoretical ANOVA table: one‐way classification, model I.

Table 5.3 is the empirical ANOVA table corresponding to Table 5.2.

5.3.1 Analysing Observations

Estimable functions of the model parameters are for instance μ + a_i(i = 1, … , a) or a_i − a_j(i, j = 1, … , a; i ≠ j) with the estimators (using (5.4)–5.7)

5.9

and

5.10

respectively. They are independent of the special choice of the reparametrisation condition.

Besides point estimation, an objective of an experiment (model I) is to test the null hypothesis H₀ : a_i = a_j for all i ≠ j against the alternative that at least two of the effects a_i differ from each other. This null hypothesis corresponds to the assumption that the effects of the factor considered for all a levels are equal. The basis of the corresponding tests is the fact that the sum of squared deviations SS of y_ij from the total mean of the experiment can be broken down into independent components.

The total sum of squared deviations of the observations from the total mean of the experiment is

The left‐hand side is called SS total or for short SS_T, the first component of the right‐hand side is called SS within the treatments or levels of factor A (for short SS within SS_res) and the last component of the right hand side SS between the treatments or levels of factor A (SS_A), respectively.

We generally write

5.11

5.12

5.13

It is known from Rasch and Schott (2018, theorem 5.4) that

5.14

is distributed as F(a − 1, N − a, λ) with the non‐centrality parameter

If H₀ : a₁ = … = a_a is valid then we have λ = 0, and thus, F is F(a − 1, N − a) distributed. Therefore, the hypothesis H₀: a₁ = … = a_a is tested by an F‐test. The ratios and are called mean squares between treatments and within treatments or residual mean squares, respectively.

Example 5.1

We assume that the breeding value of three sires of a special cattle breed concerning milk fat in kilograms is tested via the milk fat performance of their daughters. In this case we have three levels of the factor sire, i.e. a = 3.

We assume that we model the milk fat performance of the daughters by a normally distributed random variable. Then we use the F‐test in formula 5.14. We at first determine in the balanced case the numbers of daughters needed for such a performance test so that the hypothesis that the three sires have the same breeding value is erroneously rejected with a first type risk α = 0.05. The power is at least 1 − β as long as δ ≥ 2, where δ is the difference between the largest and the smallest effect. The determination will be explained in Section 5.3.2.

The expectations of these MS are

and

Under the reparametrisation condition images we obtain

Now the several steps in the simple ANOVA for model I can be summarised as follows.

We assumed that from systematically selected normally distributed populations with expectations μ + a_i and the same variance σ², representing the levels of a factor – also called treatments – independent random samples of size n_i have to be drawn. If possible, the size N of the experiment is determined in advance as small as possible so that a given precision requirement is fulfilled. That means we have to choose equal subclass numbers. However, even if an experiment is planned with equal subclass numbers, drop‐outs may lead to unequal sub‐class numbers.

For the N observations y_ij we assume model 5.3 with its side conditions. From the observations in Table 5.1 the column sums Y_i. and the number observations are initially calculated. The corresponding mean

is the UMVUE (uniformly minimum variance unbiased estimator) under the assumed normal distribution, and for arbitrary distributions with finite second moments it is the BLUE (best linear unbiased estimator) of the μ + a_i.

To test the null hypothesis H₀ : a₁ = … = a_a we calculate the realisation

5.15

5.3.2 Determination of the Size of an Experiment

In the case of the one‐way classification we determine the required experimental size for the most favourable as well as for the least favourable case, i.e. we are looking for the smallest n (for instance n = 2q) so that for λ_max = λ and for λ_min = − λ, respectively, 5.2 is fulfilled.

The experimenter must select a size n in the interval n_min ≤ n ≤ n_max but if he wants to be on the safe side, he must choose n = n_max. The package OPDOE of R allows the determination of the minimal size for the most favourable and the least favourable case in dependence on α, β, δ, σ and the number a of the levels of the factor A. The corresponding algorithm stems from Lenth (1986) and Rasch et al. (1997). In any case one can show that the minimal experimental size is smallest for the balanced case if n₁ = n₂ = … = n_a = n, which can be reached by planning the experiment.

Problem 5.1

Determine in a balanced design the sub‐class number n in a one‐way ANOVA for a precision determined by α = 0.05, β = 0.05 and δ = 2σ, and a test with 5.14. Unfortunately delta in the program below stands for τ = δ/σ.

Solution

The design function of the R‐package OPDOE for the analysis of variance has for the one‐way analysis of variance the form (unfortunately delta in the program below stands for τ = δ/σ).

 > size.anova(model="a", a= ,alpha= ,beta= ,delta= ,case= )

Example

Determine n_min and n_max for a = 3, α = 0.05, β = 0.05, and δ = 2σ.

 > size.anova(model="a", a=3, alpha=0.05, beta=0.05, delta=2,case="minimin")
n 
7 
> size.anova(model="a", a=3, alpha=0.05, beta=0.05, delta=2,case="maximin")
n 
8

Now, one of the values n_min = 7 or n_max = 8 must be used. The experimental size for a = 3 is N_min = 3 · n_min = 21 or N_max = 3 · n_max = 24.

We like to be on the safe side and take n = 8. Unfortunately, one observation for sire 2 was lost so that we received unequal subclass numbers, as is seen in Table 5.4. Table 5.5 is the corresponding ANOVA table.

Table 5.4 Performances (milk fat in kg) y_ij of the daughters of three sires.

	Sire
	B₁	B₂	B₃
y_ij	120 155 131 130 146 138 143 151	153 144 147 139 141 150 136	130 138 122 131 128 135 127 131
Y_i·	1114	1010	1042
	139.25	144.2857	130.25

Table 5.5 ANOVA table for testing the hypothesis H₀ : a₁ = a₂ = a₃ = 0 of Example 5.1.

Source of variation	SS	df	MS	F
Between sires	766.8	2	383.4	5.628
Within sires	1362.4	20	68.
Corrected total	2129.22	22

Problem 5.2

Calculate the entries in the ANOVA Table 5.3 and calculate estimates of 5.9 and 5.10.

Solution

Use the R‐commands:

 > mf1 <-  c(120, 155, 131, 130, 146, 138, 143, 151)
> mf2 <-  c(153, 144, 147, 139, 141, 150, 136)
> mf3 <-  c(130, 138, 122, 131, 128, 135, 127, 131)
> sire <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
> Pr5_2 <- data.frame(x=c(mf1,mf2,mf3),y=sire)
> meanB1 <- mean(Pr5_2$x[Pr5_2$y==1])
> meanB1
[1] 139.25
> meanB2 <- mean(Pr5_2$x[Pr5_2$y==2])
> meanB2
[1] 144.2857
> meanB3 <a5- mean(Pr5_2$x[Pr5_2$y==3])
> meanB3
[1] 130.25
> a1_a2 <- meanB1-meanB2
> a1_a2
[1] -5.035714
> a1_a3 <- meanB1-meanB3
> a1_a3
[1] 9
> a2_a3 <- meanB2-meanB3
> a2_a3
[1] 14.03571

Answers to 5.9 are: mean B₁ = 139.25, mean B₂ = 144.2857, mean B₃ = 130.25.

Answers to 5.10 are: a₁ − a₂ = −5.035714, a₁ − a₃ = 9, a₂ − a₃ = 14.03571.

Problem 5.3

Test the null hypothesis H₀ : a₁ = a₂ = a₃ = 0 with significance level α = 0.05.

Solution

The data of Table 5.3 are already in an R‐data file Pr5_2. Proceed with:

 > Bull <- factor(Pr5_2$y)
> Bull
 [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
Levels: 1 2 3
> MYaov <- aov(Pr5_2$x ∼ Bull)
> MYaov
Call:
   aov(formula = Pr5_2$x ∼ Bull)

Terms:
                     Bull Residuals
Sum of Squares   766.7888 1362.4286
Deg. of Freedom         2        20
Residual standard error: 8.253571
Estimated effects may be unbalanced
> summary(MYaov)
            Df Sum Sq Mean Sq F value Pr(>F)
Bull         2  766.8   383.4   5.628 0.0115 *
Residuals   20 1362.4    68.1
- - -
Signif. codes:0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

The p‐value Pr(>F) for the bulls is 0.0115 < 0.05, hence H₀ : a₁ = a₂ = a₃ = 0 is rejected.

To calculate the SS for the corrected total:

 > N <- length(Pr5_2[ ,1])
> N
[1] 23
> SSTot <- (N-1)*var(Pr5_2[ ,1])
> SSTot
[1] 2129.217

5.4 Two‐Way Analysis of Variance

The two‐way ANOVA is a procedure for experiments to investigate the effects of two factors. Let us investigate a varieties of wheat and b fertilisers in their effect on the yield (kilogram per hectare). The a varieties as well as the b fertilisers are assumed to be fixed (selected consciously), as always in this chapter with fixed effects. One of the factors is factor variety (factor A), and the other factor is fertiliser (factor B). In this and the next chapter the number of levels of a factor X is denoted by the same (small) letter x as the factor (a capital letter) itself. So, factor A has a, and factor B has b levels in the experiment. In experiments with two factors, the experimental material is classified in two directions. For this, we list the different possibilities:

(a) Observations occur in each level of factor A combined with each level of factor B. There are a · b combinations (classes) of factor levels. We say factor A is completely crossed with factor B or that we have a complete cross‐classification.
1. (a1) For each combination (class) of factor levels there exists one observation [(n_ij = 1 with n_ij being the number of observations in class (A_i. B_j).].
2. (a2) For each combination (class) (i, j) the level i of factor A with the level j of factor B we have observations, with at least one n_ij > 1. If all n_ij = n, we have a cross‐classification with equal class numbers, called a balanced experimental design.
(b) At least one level of factor A occurs together with at least two levels of the factor B, and at least one level of factor B occurs together with at least two levels of the factor A, but we have no complete cross‐classification. Then we say factor A is partially crossed with factor B, or we have an incomplete cross‐classification.
(c) Each level of factor B occurs together with exactly one level of factor A. This is a nested classification of factor B within factor A. We also say that factor B is nested within factor A and write B ≺ A.

Summarising the types of two‐way classification we have:

(a) n_ij = 1 for all (i, j) → complete cross‐classification with one observation per class
(b) n_ij ≥ 1 for all (i,j) → complete cross‐classification
(c) n_ij = n ≥ 1 for all (i, j) → complete cross‐classification with equal sub‐class numbers
(d) At least one n_ij = 0 → incomplete cross‐classification.

If n_kj ≠ 0, then n_ij = 0 for i ≠ k (at least one n_ij > 1 and at least two n_ij ≠ 0) → nested classification.

5.4.1 Cross‐Classification (A × B)

The observations y_ijk of a complete cross‐classification are real numbers. In class (i,j) occur the observations y_ijk, k = 1, … , n_ij. In block designs, we often have n_ij = 1 with single subclass numbers. If all n_ij = n are equal, we have the case of equal subclass numbers, which we discuss now.

Without loss of generality, we represent the levels of factor A as the rows and the levels of factor B as the columns in the tables. When empty classes occur, i.e. some n_ij equal zero, we have an incomplete cross‐classification; such a case occurs for incomplete block designs.

Let the random variables y_ijk in the class (i, j) be a random sample of a population associated with this class. The mean and variance of the population of such a class are called true mean and variance, respectively. The true mean of the class (i, j) is denoted by η_ij. Again we consider the case that the levels of the factors A and B are chosen consciously (model I).

We call

the overall expectation of the experiment.

The difference is called the main effect of the ith level of factor A, the difference is called the main effect of the j‐th level of factor B. The difference is called the effect of the ith level of factor A under the condition that factor B occurs in the jth level. Analogously, is called the effect of the jth level of factor B under the condition that factor A occurs in the ith level.

The distinction between the main effect and ‘conditional effect’ is important if the effects of the levels of one factor depend on the effect of the level of the other factor. In the analysis of variance, we then say that an interaction between the two factors exists. We define the effects of these interactions (and use them in place of the conditional results).

The interaction (a, b)_ij between the ith level of factor A and the jth level of factor B in a two‐way cross‐classification is the difference between the conditional effect of the level A_i of factor A for a given level B_j of the factors B and the main effect of the level A_i of A.

Under the assumption above the random variable y_ij of the cross‐classification varies randomly around the class mean in the form

We assume that the so‐called error variables e_ijk are independent of each other N(0, σ²) distributed and write for a balanced design the model I equation:

5.16

with (a, b)_ij = 0 if n = 0. We assume the following side conditions:

5.17

If in 5.14 all (a, b)_ij = 0 we call

5.18

a model without interactions or an additive model, respectively.

5.4.1.1 Parameter Estimation

We can estimate the parameters in any model of ANOVA by the least squares method. We minimise in the case of model (5.16)

under the side conditions 5.17 and receive (using the dot convention analogue to Section 5.3)

5 Models with Interactions

We consider the model 5.15. Because E(Y) is estimable we have

estimable. The BLUE of η_ij is

5.19

From 5.6 it follows

5.20

It is now easy to show that differences between a_i or between b_j are not estimable. All estimable functions of the components of 5.15 without further side conditions contain interaction effects (a, b)_ij. It follows from theorem 5.7 in Rasch and Schott (2018) that

5.21

or analogously

is estimable if c_rs = 0 for n_rs = 0 and d_rs = 0 for n_rs = 0 as well as

The BLUE of an estimable function of the form 5.21 is given by

5.22

with variance

5.23

We consider the following example.

5 Connected Incomplete Cross‐Classifications

In an incomplete cross‐classification we have (a, b)_ij = 0 if n_ij = 0. Further we choose the factors A and B so that a ≥ b.

An (incomplete) cross‐classification is called connected if

is non‐singular. If |W| = 0, then the cross‐classification is disconnected.

Example 5.7

We consider a two‐way cross‐classification with a = 5, b = 4 and the subclass numbers

Levels of B

Because |W| = 0 the design is disconnected. Here is n.₁ = n.₂ = 3n, n.₃ = n.₄ = 2m, n₁. = n₂. = n₃. = 2n, n₄. = n₅. = 2m, and the matrix W is given by

The first row is (−1) times the second row so that W is singular. The term “disconnected cross‐classification” can be illustrated by this example as follows. From the scheme of the sub‐class numbers we see that the levels A₁, A₂, A₃, B₁, B₂ and A₄, A₅, B₃, B₄ form two separate cross classifications. If we add n further observations in (A₂ B₃), we obtain n₂. = 3n, n_.3 = 2m + n and W becomes with |W| ≠ 0 connected,

With the knowledge of Testing Hypothesis of the following Section 5.4.1.2.1, on page 126 before the topic Models with interaction, we can easily see in a cross‐classification of A with a levels and B with b levels in an additive model whether the scheme is disconnected. In R test the B effect in the model y = A + B and/or if df(B) < b − 1; test the A effect in the model y = B + A, and/or if df(A) < a − 1 in the ANOVA table, then the scheme is disconnected.

5.4.1.2 Testing Hypotheses

In this section, testable hypotheses and tests of such hypotheses are considered.

5 Models without Interactions

We start with model 5.18 and assume a connected cross‐classification (W above non‐singular).

If, as in 5.18, n_ij = n (equal sub‐class numbers), simplifications for the tests of hypotheses about the main effects result. We have the possibility further to construct an analysis of variance table, in which SS_A, SS_B, SS_res = SS_R add to SS_total = SS_T.

If in model 5.18 n ≥ 1 for all i and j, then the sum of squared deviations of the y_ijk from the total mean of the experiment

can be written as

with

SS_A + SS_B and SS_res are independently distributed, and for normally distributed y_ijk we have distributed as CS(a − 1, λ_a), as CS(b − 1, λ_b) and as CS(N − a − b + 1) with non‐centrality parameters

Therefore, is non‐centrally F‐distributed with a − 1 and N − a − b + 1 degrees of freedom and non‐centrality parameter and is non‐centrally F‐distributed with b − 1 and N − a − b + 1 degrees of freedom and non‐centrality parameter .

The realisations of these formulas are summarised in Table 5.7.

When the null hypothesis H_A0 : a₁ = a₂ = ⋯ = a_a = 0 is correct, then λ_a = 0. This null hypothesis can therefore be tested by . If F_A > F(a − 1, N − a − b + 1, 1 − α) the null hypothesis is rejected with a first kind risk α. When the null hypothesis H_B0 : b₁ = b₂ = ⋯ = b_b = 0 is correct, then λ_b = 0. This null hypothesis can therefore be tested by . If F_B > F(b − 1, N − a − b + 1, 1 − α) the null hypothesis is rejected with a first kind risk α.

Problem 5.4

Determine the (1 − α)‐quantile of the central F‐distribution with df₁ and df₂ degrees of freedom.

Solution

Use the R command >qf(,df1,df2).

Example

We calculate the quantile of the central F‐distribution for 1 − α = 0.95, df₁ = 10 and

df₂ = 30.

 > qf(0.95,10,30)
[1] 2.16458

Problem 5.5

Determine the sample size for testing the hypothesis H_A0 : a₁ = a₂ = ⋯ = a_a = 0. Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use the OPDOE commands

 > size.anova(model="axb", hypothesis="a", a=, b=, alpha=,beta=,
       delta=,cases="maximin")

 > size.anova(model="axb", hypothesis="a", a=, b=, alpha=,beta=,
       delta=,cases="minimin")

Example

We choose α = 0.05, β = 0.2, a = 4, b = 2, δ = 1 and calculate the minimal and maximal subclass numbers n for the null hypothesis for A and B.

For testing the factor A we obtain

 > size.anova(model="axb", hypothesis="a", a=4, b=2, alpha=0.05,beta=0.1, delta=1,cases="maximin") n 
15

and

 > size.anova(model="axb", hypothesis="a", a=4, b=2, alpha=0.05,beta=0.1, delta=1,cases="minimin")
n 
8

For testing the factor B we exchange the entries for a and b and obtain

 > size.anova(model="axb", hypothesis="a", a=2, b=4, alpha=0.05,beta=0.1, delta=1,cases="maximin")
n 
6

and

 > size.anova(model="axb", hypothesis="a", a=2, b=4, alpha=0.05,beta=0.1, delta=1,cases="minimin")
n 
6

To test both hypotheses, the experimenter may use a subclass number between 8 and 15.

Table 5.9 ANOVA Table of Example 5.9.

Source of variation	SS	df	MS	F
Between the storages	43.2261	3	14.4087	186.7
Between the forage crops	0.8978	1	0.8978	11.63
Residual	0.2315	3	0.0772
Total	44.3554	7

Example 5.9

Two forage crops (green rye and lucerne) are investigated concerning their loss of carotene during storage. Four storage possibilities (glass in a refrigerator, glass in a barn, sack in a refrigerator and sack in a barn) are chosen. The loss during storage is defined by the difference between the content of carotene at start and the content of carotene after storing for 300 days (in per cent of dry mass). The question is whether the kind of storage and/or of forage crop influences the loss during storage. We denote the kind of storage as factor A and the forage crop as factor B and arrange the observations (differences y_ij) in the form of Table 5.8. Because forage crops and kinds of storage have been selected consciously, we use for y_ij a model I and 5.18 as the model equation.

Table 5.8 Observations (loss in per cent of dry mass, during storage of 300 days) of the experiment of Example 5.9 and results of first calculations.

		Forage crop
Green rye	Lucerne
Kind of storage	Glass in refrigerator	8.39	9.44
Glass in barn	11.58	12.21
Sack in refrigerator	5.42	5.56
Sack in barn	9.53	10.39

The analysis of variance assumes that the observations are realisations of random variables, which are, independently of each other, normally distributed with equal variances. Table 5.9 is the ANOVA table. As the F‐tests show, only factor storage has a significant influence on the loss during storage; significant differences are found only between the kinds of storage, but not between the forage crops (α = 0.05).

For the analysis of this design we can use in R the commands >aov() or > lm().

Use the command > aov() only for balanced two‐way cross classifications.

The command >lm() gives more detailed information and can also be used for unbalanced two‐way cross classifications.

 > loss <- c(8.39, 11.58, 5.42, 9.53, 9.44, 12.21, 5.56, 10.39)
> storage <- c(1,2,3,4,1,2,3,4)
> crop <- c(1,1,1,1,2,2,2,2)
> Table_5_8 <- data.frame(cbind(loss,storage,crop))
> Table_5_8
   loss storage crop
1  8.39       1    1
2 11.58       2    1
3  5.42       3    1
4  9.53       4    1
5  9.44       1    2
6 12.21       2    2
7  5.56       3    2
8 10.39       4    2
> STORAGE <- factor(storage)
> CROP <- factor(crop)
> Anova1 <- aov(loss ∼STORAGE + CROP, Table_5_8)
> Anova1
Call:
   aov(formula = loss ∼ STORAGE + CROP, data = Table_5_8)
Terms:
                STORAGE    CROP Residuals
Sum of Squares  43.2261  0.8978    0.2315
Deg. of Freedom       3       1         3
Residual standard error: 0.2777889
Estimated effects may be unbalanced
> summary(Anova1)
            Df Sum Sq Mean Sq F value   Pr(>F)
STORAGE      3  43.23  14.409  186.72 0.000659 ***
CROP         1   0.90   0.898   11.63 0.042121 *
Residuals    3   0.23   0.077
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1

> Anova2 <- lm(loss ∼STORAGE + CROP, Table_5_8)
> Anova2
Call:
lm(formula = loss ∼ STORAGE + CROP, data = Table_5_8)
Coefficients:
(Intercept)    STORAGE2    STORAGE3    STORAGE4       CROP2
      8.580       2.980      -3.425       1.045       0.670
> summary(Anova2)
Call:
lm(formula = loss ∼ STORAGE + CROP, data = Table_5_8)
Residuals:
     1      2      3      4      5      6      7      8
-0.190  0.020  0.265 -0.095  0.190 -0.020 -0.265  0.095
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   8.5800     0.2196  39.069 3.69e-05 ***
STORAGE2      2.9800     0.2778  10.728  0.00173 **
STORAGE3     -3.4250     0.2778 -12.330  0.00115 **
STORAGE4      1.0450     0.2778   3.762  0.03285 *
CROP2         0.6700     0.1964   3.411  0.04212 *
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
Residual standard error: 0.2778 on 3 degrees of freedom
Multiple R-squared:  0.9948,    Adjusted R-squared:  0.9878
F-statistic:   143 on 4 and 3 DF,  p-value: 0.0009397
> anova(Anova2)
Analysis of Variance Table
Response: loss
          Df Sum Sq Mean Sq F value   Pr(>F)
STORAGE    3 43.226 14.4087 186.722 0.000659 ***
CROP       1  0.898  0.8978  11.635 0.042121 *
Residuals  3  0.231  0.0772
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1

Example 5.10

We use the data of Table 5.6 with an additive model, which is an example of an unbalanced design. Hence, we must use the R command >lm(). For testing the hypothesis about the test periods effects we must use another formula in >lm() for the testing of the sex effects. We use for the tests a significance level α = 0.05.

 > days <- c(91,84,86,94,92,90,96,82,86,99,97,89)
> periods <- c(1,1,1,2,2,2,2,3,3,1,2,2)
> sex <- c(1,1,1,1,1,1,1,1,1,2,2,2)
> T_5_6 <- data.frame(cbind(days,periods,sex))
> T_5_6
   days periods sex
1    91       1   1
2    84       1   1
3    86       1   1
4    94       2   1
5    92       2   1
6    90       2   1
7    96       2   1
8    82       3   1
9    86       3   1
10   99       1   2
11   97       2   2
12   89       2   2
> # Prepare ANOVA for testing hypothesis Test periods effects
> Anova1 <- lm(days ∼SEX + PERIODS, T_5_6)
> Anova1
Call:
lm(formula = days ∼ SEX + PERIODS, data = T_5_6)
Coefficients:
(Intercept)         SEX2     PERIODS2     PERIODS3
      88.92         4.32         2.64        -4.92
> summary(Anova1)
Call:
lm(formula = days ∼ SEX + PERIODS, data = T_5_6)
Residuals:
   Min     1Q Median     3Q    Max
 -6.88  -2.23   0.78   2.17   5.76
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   88.920      2.330  38.162 2.44e-10 ***
SEX2           4.320      3.051   1.416    0.194
PERIODS2       2.640      2.854   0.925    0.382
PERIODS3      -4.920      3.889  -1.265    0.241
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
Residual standard error: 4.403 on 8 degrees of freedom
Multiple R-squared:  0.5107,    Adjusted R-squared:  0.3272
F-statistic: 2.783 on 3 and 8 DF,  p-value: 0.1098
> anova(Anova1)
Analysis of Variance Table
Response: days
          Df Sum Sq Mean Sq F value  Pr(>F)
SEX        1  81.00   81.00  4.1774 0.07522 .
PERIODS    2  80.88   40.44  2.0856 0.18665
Residuals  8 155.12   19.39
- - -
> # Prepare ANOVA for testing Hypothesis about Sex effects
> Anova2 <- lm(days ∼PERIODS + SEX, T_5_6)
> Anova2
Call:
lm(formula = days ∼ PERIODS + SEX, data = T_5_6)
Coefficients:
(Intercept)     PERIODS2     PERIODS3         SEX2
      88.92         2.64        -4.92         4.32
> summary(Anova2)
Call:
lm(formula = days ∼ PERIODS + SEX, data = T_5_6)
Residuals:
   Min     1Q Median     3Q    Max
 -6.88  -2.23   0.78   2.17   5.76
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   88.920      2.330  38.162 2.44e-10 ***
PERIODS2       2.640      2.854   0.925    0.382
PERIODS3      -4.920      3.889  -1.265    0.241
SEX2           4.320      3.051   1.416    0.194
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
Residual standard error: 4.403 on 8 degrees of freedom
Multiple R-squared:  0.5107,    Adjusted R-squared:  0.3272
F-statistic: 2.783 on 3 and 8 DF,  p-value: 0.1098
> anova(Anova2)
Analysis of Variance Table
Response: days
          Df Sum Sq Mean Sq F value  Pr(>F)
PERIODS    2 123.00   61.50  3.1717 0.09677 .
SEX        1  38.88   38.88  2.0052 0.19450
Residuals  8 155.12   19.39
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1

From the ANOVA table of Anova1 we must use the last line of PERIODS for testing test period effects with F‐value 2.0856 and p‐value Pr(F>) = 0.18665, which is larger than α = 0.05, hence the hypothesis of equal test period effects is not rejected.

From the ANOVA table of Anova2 we must use the last line of SEX for testing sex effects with F‐value 2.00052 and p‐value Pr(F>) = 0.19450, which is larger than α = 0.05, hence the hypothesis of equal sex effects is not rejected.

5 Models with Interactions

We consider now model 5.16 and assume a connected cross‐classification.

The ANOVA table for this case is Table 5.10.

In the case of equal subclass numbers, we use the side conditions 5.17 and test the hypotheses

with the F‐statistics

5.24

5.25

and

5.26

SS_A, SS_B, SS_AB and SS_res are independently distributed, and for normally distributed y_ijk it is as CS(a − 1, λ_a), as CS(b − 1, λ_b), as CS((a−1)(b − 1), λ_ab) and as CS(N − a − b + 1) distributed with non‐centrality parameters and , respectively.

Therefore is non‐centrally F‐distributed with a − 1 and N − a − b + 1 degrees of freedom and non‐centrality parameter , is non‐centrally F‐distributed with b − 1 and N − a − b + 1 degrees of freedom and non‐centrality parameter and is non‐centrally F‐distributed with (a − 1)(b − 1) and N − a − b + 1 degrees of freedom and non‐centrality parameter

When the null hypotheses above are correct, then the corresponding non‐centrality parameter is zero. The corresponding null hypothesis can therefore be tested by , and , respectively. If F_A > F(a − 1, N − a − b + 1, 1 − α) the null hypothesis is rejected with a first kind risk α. If F_B > F(b − 1, N − a − b + 1, 1 − α) the null hypothesis is rejected with a first kind risk α. Finally, if F_AB > F((a − 1)(b − 1), N − a − b + 1, 1 − α) the null hypothesis H_A × B0 is rejected with a first kind risk α.

Table 5.11 Observations of the carotene storage experiment of Example 5.12.

		Forage crop
Green rye	Lucerne
Kind of storage	Glass	8.39 7.68 9.46 8.12	9.44 10.12 8.79 8.89
Sack	5.42 6.21 4.98 6.04	5.56 4.78 6.18 5.91

Table 5.12 ANOVA table for the carotene storage experiment of Example 5.12.

Source of variation	SS	df	MS	F
Between the kind of storage	41.6347	1	41.6347	101.70
Between the forage crops	0.7098	1	0.7098	1.73
Interactions	0.9073	1	0.9073	2.22
Within classes (residual)	4.9128	12	0.4094
Total	48.1646	15

Problem 5.6

Calculate the entries of Table 5.11 and give the commands for Table 5.12.

Solution

Make a data frame and use the R command >lm() to get the ANOVA table.

Because we have a balanced two‐way cross‐classification we do not need to use a different formula in > lm() for the test of the main effects.

Example

 > loss <- c(8.39,7.68,9.46,8.12,5.42,6.21,4.98,6.04,9.44,10.12,
  8.79,8.89,5.56,4.78,6.18,5.91)
> storage <- c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)
> crop <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
> T_5_11 <- data.frame(cbind(loss,storage,crop))
> T_5_11
loss storage crop
1 8.39 1 1
2 7.68 1 1
3 9.46 1 1
4 8.12 1 1
5 5.42 2 1
6 6.21 2 1
7 4.98 2 1
8 6.04 2 1
9 9.44 1 2
10 10.12 1 2
11 8.79 1 2
12 8.89 1 2
13 5.56 2 2
14 4.78 2 2
15 6.18 2 2
16 5.91 2 2
> STORAGE <- factor(storage)
> CROP <- factor(crop)
> Anova <- lm(loss ∼ STORAGE + CROP + STORAGE*CROP, T_5_11)
> Anova
Call:
lm(formula = loss ∼ STORAGE + CROP + STORAGE * CROP, data = T_5_11)

Coefficients:
(Intercept) STORAGE2 CROP2 STORAGE2:CROP2
8.4125 -2.7500 0.8975 -0.9525

 > summary(Anova)
Call:
lm(formula = loss ∼ STORAGE + CROP + STORAGE * CROP, data = T_5_11)
Residuals:
Min 1Q Median 3Q Max
-0.8275 -0.4450 -0.0350 0.4200 1.0475
Coefficients:
              Estimate Std. Error       t value Pr(>|t|)
(Intercept)        v8.4125 0.3199        26.295 5.60e-12 ***
STORAGE2          -2.7500 0.4524        -6.078 5.52e-05 ***
CROP2              v0.8975 0.4524        1.984 0.0706 .
STORAGE2:CROP2    -0.9525 0.6398        -1.489 0.1624
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
Residual standard error: 0.6398 on 12° of freedom
Multiple R-squared: 0.898, Adjusted R-squared: 0.8725
F-statistic: 35.22 on 3 and 12 DF, p-value: 3.155e-06

> anova(Anova)
Analysis of Variance Table

Response: loss
               Df  Sum Sq  Mean Sq    F value   Pr(>F)
STORAGE        1   41.635  41.63510   1.6965    3.27e-07 ***
CROP           1   0.710   0.710      1.7338    0.2125
STORAGE:CROP   1   0.907   0.907      2.2161    0.1624
Residuals      12  4.913   0.409
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1

Problem 5.7

Calculate the sample size for testing H_AA0 : (a, b)₁₁ = (a, b)₁₂ = ⋯ = (a, b)_ab = 0. Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use the OPDOE command

 > size.anova(model = "axb", hypothesis = "axb", a=, b=, alpha=, beta=,
 delta=, cases=)

Example

How many replications in the four subclasses are needed to test the hypothesis

with the precision requirements α = 0.05, β =0.1 and δ = 2σ?

 > size.anova(model="axb", hypothesis="axb", a=2, b=2,  alpha=0.05,beta=0.1, delta=2, cases="minimin")
n
4

> size.anova(model="axb", hypothesis="axb", a=2, b=2,  alpha=0.05,beta=0.1, delta=2, cases="maximin")
n
6

Therefore, the sub‐class number has to be between 4 and 6.

5.4.2 Nested Classification (A≻B)

A nested classification is a classification with super‐ and sub‐ordinated factors, where the levels of a sub‐ordinated or nested factor are considered as further subdivision of the levels of the super‐ordinated factor. Each level of the nested factor occurs in just one level of the super‐ordinated factor. An example is the subdivision of the United States into states (super‐ordinated factor A) and counties (nested factor B).

As for the cross‐classification we assume that the random variables y_ijk vary randomly from the expectations η_ij, i.e.

and that e_ijk are, independently of each other, N(0, σ²)‐distributed. With

the total mean of the experiment is defined.

In nested classification, interactions cannot occur.

The difference is called the effect of the ith level of factor A, the difference b_ij = η_ij − η_i. is the effect of the jth level of B within the ith level of A.

The model equation for y_ijk in the balanced case then reads

5.27

(interactions do not exist).

Usual side conditions are

5.28

Minimising

under the side conditions 5.28, we obtain the BLUE

5.29

The total sum of squares is again split into components

where SS_A is the SS between the A levels, SS_{B in A} is the SS between the B levels within the A levels and SS_res is the SS within the classes (B levels).

The SS are written in the form

Here and in the sequel, we assume the side conditions 5.28.

The expectations of the MS are given in Table 5.13.

MS_A, MS_{B in A} and MS_res in Table 5.13 are independently of each other distributed as CS(a − 1, λ_a), CS(B. − a, λ_{b in a}) and CS(N − B.), respectively, where

Therefore

and

Using this the null hypothesis H_A0: a₁ = … = a_a, can be tested by , which, under H_A0 is distributed as F(a − 1, N − B_.).

The null hypothesis H_B0: b₁₁ = … = b_ab, can be tested by , which, under H_B0, is distributed as F(B_. − a, N − B_.).

Problem 5.8

Calculate the ANOVA table with the realised sum of squares of Table 5.14.

Table 5.14 Observations of the example.

	Levels of A
	A₁				A₂				A₃
	Levels of B
Observation	B₁₁	B₁₂	B₁₃	B₁₄	B₂₁	B₂₂	B₂₃	B₂₄	B₃₁	B₃₂	B₃₃	B₃₄
1	30	0	7	28	24	14	20	20	14	14	18	−25
2	−19	20	5	15	16	11	18	−12	−18	8	16	13
3	−31	32	3	20	−18	27	8	0	33	−19	7	36
4	−14	11	−5	20	11	11	32	−5	−9	6	−4	18
5	−14	13	8	−48	−10	8	−25	44	−7	−38	21	−7

Solution

We use in R the command >lm().

Example

We consider an example of the balanced case with a = 3 levels of factor A, b = 4 levels of factor B within each level of factor A. The data are shown in Table 5.14.

 > y1 <- c(30,-19,-31,-14,-14,0,20,32,11,13,7,5,3,-5,8,28,15,20,20,-48)
> y2 <- c(24,16,-18,11,-10,14,11,27,11,8,20,18,8,32,-25,20,-12,0,-5,44)
> y3 <- c(14,-18,33,-9,-7,14,8,-19,6,-38,18,16,7,-4,21,-25,13,36,18,-7)
> a1 <- rep(1,20)
> a2 <- rep(2,20)
> a3 <- rep(3,20)
> b1 <- c(11,11,11,11,11,12,12,12,12,12,13,13,13,13,13,14,14,14,14,14)
> b2 <- c(21,21,21,21,21,22,22,22,22,22,23,23,23,23,23,24,24,24,24,24)
> b3 <- c(31,31,31,31,31,32,32,32,32,32,33,33,33,33,33,34,34,34,34,34)
> y <- c(y1,y2,y3)
> a <- c(a1,a2,a3)
> b <- c(b1,b2,b3)
> T_5_14 <- data.frame(cbind(y,a,b))
> A <- factor(a)
> B <- factor(b)
> Anova <- lm(y∼A + A/B, T_5_14)
> anova(Anova)
Analysis of Variance Table
Response: y
          Df  Sum Sq Mean Sq F value Pr(>F)
A          2   441.2  220.62  0.5759 0.5661
A:B        9  2656.9  295.21  0.7706 0.6437
Residuals 48 18388.8  383.10

We now show how, for the nested classification, the minimal experimental size can be determined. We choose for testing the effects of A.

Problem 5.9

Determine the subclass number for fixed precision to test H_A0 : a₁ = a₂ = ⋯ = a_a = 0 and H_B0 : b₁ = b₂ = ⋯ = b_b = 0. Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Choose first for testing H_A0: a₁ = … = a_a, the OPDOE command

 > size.anova(model="a>b",hypothesis="a",a=,b=,alpha=,beta=, delta=,cases="minimin")

case ="maximin".

Choose for testing H_B0: b₁₁ = … = b_ab, the OPDOE command

 > size.anova(model="a>b",hypothesis="b",a=,b=,alpha=,beta=, delta=,case="minimin")

case = "maximin".

Example

We use α = 0.05, a = 5, b = 8, β = 0.05, and δ = 1.

 > size.anova(model="a>b",hypothesis="a",a=5,b=8,alpha=0.05,beta=0.05,
     delta=1,case="minimin")
n 
3 
> size.anova(model="a>b",hypothesis="a",a=5,b=8,alpha=0.05, beta=0.05,delta=1,case="maximin")
n 
5

We have to choose between three and five observations per level of factor B. For testing the effects of the factor B we use α = 0.01, a = 5, b = 8, β = 0.1, and δ = 1.

 > size.anova(model="a>b",hypothesis="b",a=5,b=8,alpha=0.01, beta=0.1,delta=1,case="minimin")
n 
5
> size.anova(model="a>b",hypothesis="b",a=5,b=8,alpha=0.01, beta=0.1,delta=1,case="maximin") n 
84 .

5.5 Three‐Way Classification

The principle underlying the two‐way ANOVA (two‐way classification) is also useful if more than two factors occur in an experiment. In this section, we only give a short overview of the cases with three factors without proving all statements because the principles of proof are similar to those in the case with two factors.

We consider the case with three factors because it often occurs in applications, which can be handled with a justifiable number of pages, and because besides the cross‐classification and the nested classification a mixed classification occurs. At this point, we make some remarks about the numerical analysis of experiments using ANOVA. Certainly, a general computer program for arbitrary classifications and numbers of factors with unequal class numbers can be elaborated. However, such a program, even with modern computers, is not easy to apply because the data matrices easily obtain several tens of thousands of rows. Therefore, we give for some special cases of the three‐way analysis of variance numerical solutions for which easy‐to‐use programs are written using R.

Problems with more than three factors are described in Hartung et al. (1997) and in Rasch et al. (2008).

5.5.1 Complete Cross‐Classification (A×B × C)

We assume that the observations of an experiment are influenced by three factors A, B, and C with a, b, and c levels A₁, … , A_a, B₁, … , B_b, and C₁, … , C_c, respectively. For each possible combination (A_i, B_j, C_k) let n ≥ 1 observations y_ijkl(l = 1, ⋯, n) be present. If the subclass numbers n_ijkl are not all equal or if some of them are zero but the classification is connected, we must use in R different models in >lm(). For the testing of interaction A × B, in the formula A × B must be placed before A × B × C; for the testing of interaction A × C, in the formula A × C must be placed before A × B × C; for the testing of interaction B × C in the formula B × C must be placed before A × B × C. An example of the analysis of an unbalanced three‐way cross‐classification with R is described at the end of this section.

Each combination (A_i, B_j, C_k) (i = 1, ⋯, a; j = 1, ⋯, b; k = 1, ⋯, c) of factor levels is called a class and is characterised by (i, j, k). The expectation in the population associated with the class (i, j, k) is η_ijk.

We define

and

The overall expectation is

The main effects of the factors A, B, and C we define by

Assuming that the experiment is performed at a particular level C_k of the factor C we have a two‐way classification with the factors A and B, and the conditional interactions between the levels of the factors A and B for fixed k are given by

The interactions (a, b)_ij between the ith A level and the jth B level are the means over all C levels, i.e. (a, b)_ij is defined as

The interactions between A levels and C levels (a, c)_ik and between B levels and C levels (b, c)_jk are defined by

and

respectively.

The difference between the conditional interactions between the levels of two of the three factors for the given level of the third factor and the (unconditional) interaction of these two factors depends only on the indices of the levels of the factors, and not on the factor for which the interaction of two factors is calculated. We call it the second order interaction (a, b, c)_ijk (between the levels of three factors). Without loss of generality we write

The interactions between the levels of two factors are called first‐order interactions. From the definition of the main effect and the interactions we write for η_ijk

Under the definitions above, the side conditions for all values of the indices not occurring in the summation at any time are

The n observations y_ijkl in each class are assumed to be independent of each other and N(0, σ²)‐distributed. The variable (called error term) e_ijkl is the difference between y_ijkl and the expectation η_ijk of the class, i.e. we have

5.30

By the least squares method we obtain the following estimators:

as well as

If any of the interaction effects in 5.30 are zero the corresponding estimator above is dropped the others remain unchanged. The following model equations lead to different SS_res, as shown in Table 5.15.

5.31

We may split the overall sum of squares into eight components: three corresponding with the main effects, three with the first order interactions, one with the second order interaction, and one with the error term or the residual. The corresponding SS are shown in the ANOVA table (Table 5.15). In this table N is, again, the total number of observations, N = abcn.

The following hypotheses can be tested (H_0x is one of the hypotheses H_0A, … , H_0ABC; SS_x is the corresponding SS).

Under the hypothesis are independent of each other, with the df given in the ANOVA table, centrally χ²‐distributed. Therefore, the test statistics given in column F of the ANOVA table are, with the corresponding degrees of freedom, centrally F‐distributed. For n = 1 all hypotheses except H_ABC0 can be tested under the assumption (a, b, c)_ijk = 0 for all i, j, k because then and , under H_x0(x = A, B, C, etc.), are independent of each other centrally χ^2_‐distributed. The test statistic F_x is given by

If the null hypotheses are not true, we have non‐central distributions with non‐centrality parameters analogous to those in Section 5.4.

Example 5.16

We consider a three‐way analysis of variance with class numbers n = 2; as factors we use the forage group (A), the kind of storage (B – barn or refrigerator) and the packaging material (C – glass or sack) (Table 5.16). We have a = b = c = 2 and n = 2. Table 5.17 is the ANOVA table of the examples; the F‐tests are done under the assumption that all second order interactions vanish using the SS_res defined above. Only between the kinds of storage significant differences (α = 0.05) can be found, i.e. only the hypothesis H_A is rejected.

Table 5.16 Three‐way classification with factors kind of storage, packaging material and forage crop.

		Forage crop
Kind of storage	Packaging material	Green rye	Lucerne
Refrigerator	Glass	8.39 8.69	9.44 10.1
	Sack	5.42 6.13	5.56 4.97
Barn	Glass	11.58 10.56	12.21 11.87
	Sack	9.53 8.78	10.39 9.96

Table 5.17 ANOVA Table for data of Table 5.16.

Source of variation	SS	df	MS	F
Between kind of storage	42.837	1	42.837	208.8847
Between forage crops	1.836	1	1.836	8.9529
Between packaging material	30.526	1	30.526	148.8510
Interaction kind of storage × packaging material	3.045	1	3.045	14.8483
Interaction kind of storage × forage crops	0.403	1	0.403	1.9662
Interaction forage crops × packaging material	0.714	1	0.714	3.4818
Interaction storage × forage × material	0.801	1	0.801	3.9060
Residual	1.641	8	0.205
Total	81.80258	15

Problem 5.10

Calculate the ANOVA table of a three‐way cross‐classification for model 5.30.

Solution

Make a data‐frame of the observations and use the command >lm().

Example

We use the data of Table 5.16 and calculate the entries of Table 5.17.

 > y <- c(8.39,8.69,5.42,6.13,11.58,10.56,9.53,8.78, 9.44,10.1,5.56,4.97,12.21,11.87,10.39,9.96)
> storage <- c(rep(1,4),rep(2,4),rep(1,4),rep(2,4))
> material <-  c(rep(1,2),rep(2,2),rep(1,2),rep(2,2),rep(1,2),
     rep(2,2),rep(1,2),rep(2,2))
> crop <- c(rep(1,8),rep(2,8))
> T_5_16 <- data.frame(cbind(y, storage, material, crop))
> STOR <- factor(storage)
> MAT <- factor(material)
> CROP <- factor(crop)
> Anova <- lm(y∼ STOR*MAT*CROP, T_5_16)
> Anova
Call:
lm(formula = y ∼ STOR + MAT + CROP + STOR * MAT + STOR * CROP +  MAT *
CROP + STOR * MAT * CROP)
Coefficients: (Intercept)        STOR2              MAT2  CROP2   STOR2:MAT2      STOR2:CROP2   MAT2:CROP2
STOR2:MAT2:CROP2  8.540        2.530           -2.765        1.230
0.850            -0.260            -1.740 1.790 
> summary(Anova)
Call:
lm(formula = y ∼ STOR + MAT + CROP + STOR * MAT + STOR * CROP
+    MAT * CROP + STOR * MAT * CROP)
Residuals: Min      1Q  Median      3Q     Max 
-0.5100 -0.3038  0.0000  0.3038  0.5100 
Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept)
8.5400  0.3202  26.670  4.2e-09 ***
STOR2        2.5300     0.4529   5.587 0.000518 ***
MAT2        -2.7650     0.4529  -6.106 0.000288 ***
CROP2        1.2300     0.4529   2.716 0.026407 * STOR2:MAT2
0.8500 0.6404 1.327 0.221058 STOR2:CROP2  -0.2600   0.6404 -
0.406 0.695401 MAT2:CROP2 -1.7400  0.6404  -2.717 0.026374 *
STOR2:MAT2:CROP2   1.7900     0.9057   1.976 0.083517 .  
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
Residual standard error: 0.4529 on 8 degrees of freedom
Multiple R-squared:  0.9799,    Adjusted R-squared:  0.9624 
F-statistic: 55.84 on 7 and 8 DF,  p-value: 3.645e-06
> anova(Anova)
Analysis of Variance Table
Response: y Df Sum Sq Mean Sq  F value    Pr(>F)  
STOR           1 42.837  42.837 208.8847 5.138e-07 ***
MAT            1 30.526  30.526 148.8510 1.889e-06 ***
CROP           1  1.836   1.836   8.9529  0.017277 *  
STOR:MAT       1  3.045   3.045  14.8483  0.004854 ** 
STOR:CROP      1  0.403   0.403   1.9662  0.198439    
MAT:CROP       1  0.714   0.714   3.4818  0.099021 .  
STOR:MAT:CROP  1  0.801   0.801   3.9060  0.083517 .  
Residuals      8  1.641   0.205                       
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
> N <- length(y)
> N
[1] 16
> SST <- (N-1)*var(y)
> SST
[1] 81.80258
> df <- N-1 df
[1] 15

If the subclass numbers n_ijkl are not all equal or if some of them are zero but the classification is connected, we must change the formula for testing the two‐factor interactions. In the following example, we give an unbalanced three‐way cross‐classification.

Example 5.17

From Kuehl (1994) we change an exercise to an unbalanced one. The California brown shrimp spawn at sea and the hatched eggs undergo larval transformation while being transported towards the shore. By the time they transform to post‐larval stage they enter estuaries, where they grow rapidly into sub‐adults and migrate back offshore as they approach sexual maturity. The shrimp encounter wide temperature and salinity variations in their life cycle because of their migrations during the cycle. Thus, a knowledge of how temperature and salinity affect their growth and survival is of great importance to understanding their life history and ecology. From the standpoint of mariculture, another important factor is stocking density in the culture tanks that affects intraspecific competition. The investigators wanted to know how water temperature, water salinity, and density of shrimp populations influenced the growth rate of shrimp raised in aquaria and whether the factor acted independently on the shrimp populations. A factorial arrangement was used with three factors: T (temperature: 25 °C, 35 °C); S (salinity of the water: 10%, 25%, 40%); and D (density of shrimp in the aquarium: 80 shrimp/40 l, 160 shrimp/40 l). The levels were those considered most likely to exhibit an effect if the factor was influential on shrimp growth. The experiment design consisted of three replicate aquaria for each of the 12 treatment combinations of the 2 × 3 × 2 factorial. Each of the 12 treatment combinations was randomly assigned to three aquaria for a completely randomised design. The 36 aquaria were stocked with post‐larval shrimp at the beginning of the test. The weight gain of the shrimp in four weeks for each of the 36 aquaria is shown in Table 5.18 on a per‐shrimp basis. From the balanced experiment we have discarded at random three aquaria results to give the analysis of an unbalanced 2 × 3 × 2 factorial. The missing data are indicated by an asterisk.

Table 5.18 Water temperature (T), water salinity (S), and density of shrimp populations (D) and the weight gain (mg) of shrimp.

T	D	S	Weight gain (mg)
1 (25 °C)	1 (80)	1 (10%)	86, 52, 73
		2 (25%)	544, *, 482
		3 (40%)	390, 290, 397
	2 (160)	1	53, 73, 86
		2	393, 398, *
		3	249, 265, 243
2 (35 °C)	1	1	439, 436, 349
		2	249, 245, 330
		3	*, 277, 205
	2	1	324, 305, 364
		2	352, 267, 316
		3	188, 223, 281

In R the missing data must be indicated by NA (= not available).

   > y1 <- c(86,52,73,544,NA, 482,390,290,397)
  > y2 <- c(53,73,86,393,398,NA,249,265,243)
  > y3 <- c(439,436,349,249,245,330,NA,277,205)
  > y4 <- c(324,305,364,352,267,316,188,23,281)
  > t1 <- c(rep(1,18))
  > t2 <- c(rep(2,18))
  > d1 <- c(rep(1,9), rep(2,9))
  > d2 <- c(rep(1,9), rep(2,9))
  > s1 <- c(rep(1,3),rep(2,3),rep(3,3))
  > s2 <- c(rep(1,3),rep(2,3),rep(3,3))
  > s3 <- c(rep(1,3),rep(2,3),rep(3,3))
  > s4 <- c(rep(1,3),rep(2,3),rep(3,3))
  > y <- c(y1,y2,y3,y4)
  > t <- c(t1,t2)
  > d <- c(d1,d2)
  > s <- c(s1,s2,s3,s4)
  > table <- data.frame(cbind(y,t,d,s))
  > T <- factor(t)
  > S <- factor(s)
  > D <- factor(d)
  > anova1 <- lm (y ∼ T + D + S + T*S + T*D + S*D + T*S*D , table)
  > anova(anova1)
  Analysis of Variance Table
  Response: y
            Df Sum Sq Mean Sq F value    Pr(>F)
  T          1  11012   11012  3.6988  0.068115 .
  D          1  27697   27697  9.3031  0.006082 **
  S          2 101840   50920 17.1037 3.911e-05 ***
  T:S        2 354666  177333 59.5651 2.212e-09 ***
  T:D        1   2226    2226  0.7478  0.396945
  D:S        2   6300    3150  1.0581  0.364910
  T:D:S      2  17923    8961  3.0100  0.070891 .
  Residuals 21  62520    2977
  - - -
  Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
  > anova2 <- lm (y ∼ T + D + S + T*S + S*D + T*D + T*S*D)
  > anova(anova2)
  Analysis of Variance Table
  Response: y
            Df Sum Sq Mean Sq F value    Pr(>F)
  T          1  11012   11012  3.6988  0.068115 .
  D          1  27697   27697  9.3031  0.006082 **
  S          2 101840   50920 17.1037 3.911e-05 ***
  T:S        2 354666  177333 59.5651 2.212e-09 ***
  D:S        2   7055    3527  1.1848  0.325428
  T:D        1   1472    1472  0.4943  0.489729
  T:D:S      2  17923    8961  3.0100  0.070891 .
  Residuals 21  62520    2977
  - - - 
  Signif. codes:  0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
  > anova3 <- lm ( y ∼ T + D + S + T*D + S*D + T*S + T*S*D)
  > anova(anova3)
  Analysis of Variance Table
  Response: y
            Df Sum Sq Mean Sq F value    Pr(>F)
  T          1  11012   11012  3.6988  0.068115 .
  D          1  27697   27697  9.3031  0.006082 **
  S          2 101840   50920 17.1037 3.911e-05 ***
  T:D        1   1212    1212  0.4071  0.530337
  D:S        2   9700    4850  1.6291  0.219931
  T:S        2 352281  176140 59.1644 2.349e-09 ***
  T:D:S      2  17923    8961  3.0100  0.070891 .
  Residuals 21  62520    2977
  - - -

Note that in the three ANOVA tables the SS(T:D:S) remains the same, and the test of this interaction TDS is correct.

The test for the interaction DS is given in anova1, the test for interaction TD is given in

anova2 and the test of interaction TS is given in anova3.

This is due to the fact that the program lm() gives a hierarchical analysis of the variance table; each factor is corrected for his predecessors. In a balanced design the order of the interactions is not important, but for unbalanced design you must give the correct order to find a test for the second‐order interactions.

Example 5.18

We try to plan an experiment with three cross‐classified factors A with a = 3 levels, B with b = 4 levels and C with c = 5 levels and assume model 5.31. We further put α = 0.01, β = 0.1, and δ = 0.5. We demonstrate the program for a hypothesis of the main effects of A and the interaction effects A × B.

 > size.anova(model = "axbxc",hypothesis = "a",a = 3,b = 4,c = 5,
 alpha = 0.01,beta = 0.1,delta = 0.5, case = "minimin")
n 
5
> size.anova(model = "axbxc",hypothesis = "a",a = 3,b = 4,c = 5,alpha = 0.01,
 beta = 0.1,delta = 0.5,case = "maximin")
n 
8
> size.anova(model = "axbxc",hypothesis = "axb",a = 3,b = 4, c = 5,
alpha = 0.01,beta = 0.1,delta = 0.5,case = "minimin")
n 
7
> size.anova(model = "axbxc",hypothesis = "axb",a = 3,b = 4, c = 5,
 alpha = 0.01,beta = 0.1,delta = 0.5, case = "maximin") n 
38

5.5.2 Nested Classification (C ≺ B ≺ A)

We speak about a three‐way nested classification if factor C is sub‐ordinated to factor B and factor B is sub‐ordinated to factor A, i.e. if C ≺ B ≺ A. We assume that the random variable y_ijkl varies randomly with expected value η_ijk(i = 1, … , a; j = 1, … , b_i; k = 1, … , c_ij), i.e. we assume

where e_ijkl, independent of each other, are N(0, σ²)‐distributed. Using

we define the total mean of the experiment as images .

The difference is called the effect of the ith level of A, the difference is called the effect of the jth level of B within the ith level of A and the difference is called the effect of the kth level of C within the jth level of B and the ith level of A.

Then we model the observations using

5.32

There exist no interactions. We consider 5.32 with under the side conditions

Minimising

5.33

under the side conditions above, leads to the BLUE of the parameters as follows

In a three‐way nested classification we have

with images , and

The variables up to are, with , pairwise independently CS(a − 1, λ_a), CS(B. − a, λ_b), CS(C.. − B., λ_c), respectively, and is CS(N − C..)‐distributed. The non‐centrality parameters λ_a, λ_b and λ_c vanish under the null hypotheses H_A0 : a_i = 0 (i = 1, . … a), H_B0 : b_ij = 0 (i = 1, . … a; j = 1, … , b_i), H_C0 : c_ijk = 0 (i = 1, . … a; j = 1, … , b_i; k = 1, … , c_ij), so that the usual F statistics can be used. Table 5.19 shows the SS and MS for calculating the F‐statistics. If H_A0 is valid F_A is F(a − 1, N − C..)‐distributed. If H_B0 is valid then F_B is F(B − a, N − C..)‐distributed, and if H_C0 is valid then F_c is F(C.. − B., N − C..)‐distributed.

Problem 5.12

Calculate the ANOVA table for the data of Table 5.20.

Table 5.20 Observations of a three‐way nested classification.

Factor A	A₁						A₂
B	B₁₁			B₁₂			B₂₁			B₂₂
C	C₁₁₁	C₁₁₂	C₁₁₃	C₁₂₁	C₁₂₂	C₁₂₃	C₂₁₁	C₂₁₂	C₂₁₃	C₂₂₁	C₂₂₂	C₂₂₃
1	93	109	102	89	81	87	97	88	80	83	82	81
2	89	107	101	102	83	91	93	92	84	88	89	82
3	97	94	99	104	85	82	95	94	83	87	93	80
4	105	106	98	97	91	85	91	82	81	86	81	85

Solution

After making the data‐frame of the observations use > lm()

Example

 > y1 <- c(93,89,97 105 109 107,94 106 102 101,99,98,89 102 104,97,81,83,85,
 91,87,91,82,85)
> y2 <- c(97,93,95,91,88,92,94,82,80,84,83,81,83,88,87,86,82,89,
 93,81,81,82,80,85)
> a1 <- rep(1, 24)
> a2 <- rep(2, 24)
> b11_12 <- c(rep(11, 12), rep(12, 12))
> b21_22 <- c(rep(21, 12), rep(22, 12))
> c1 <- c(rep(111,4), rep(112,4), rep(113,4), rep(121,4),  rep(122,4), rep(123,4))
> c2 <- c(rep(211,4), rep(212,4), rep(213,4), rep(221,4),  rep(222,4), rep(223,4))
> y <- c(y1,y2)
> a <- c(a1,a2)
> b <- c(b11_12, b21_22)
> c <- c(c1,c2)
> T_5_19 <- data.frame(cbind[y,a,b,c])
> A <- factor(a)
> B <- factor(b)
> C <- factor(c)
> Anova <- lm(y ∼ A + A/B + A/B/C, T_5_19)
> anova(Anova)
Analysis of Variance Table
Response: y Df   Sum Sq  Mean Sq  F value   Pr(>F) 
A            1    833.33  833.33   39.3959   2.972e-07 ***
A : B        2    707.42  353.71   16.7216   7.313e-06 ***
A : B : C    8    875.67  109.46   5.1747    0.000243 ***
Residuals    36   761.50  21.15

Now we show how to calculate the minimal subclass numbers for the three F‐tests in the nested classification using R.

Problem 5.13

Determine the minimal subclass numbers for the three tests of the main effects.

Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use in OPDOE the command > size.anova(model = "a > b > c",hypothesis=,a=,b=,c=,alpha=,beta=,delta=,cases=) with case="minimin" or case="maximin".

Example

We try to plan an experiment for three‐way nested classification factors A with a = 3 levels, B in A with b = 3 levels and C in B in A with c = 4 levels and assume model 5.32. We further put α = 0.01, β = 0.1 and δ = 0.5.

 > size.anova(model="a>b>c",hypothesis="a",a=3,b=3,c=4, lpha=0.01,beta=0.1,delta=0.5,case="minimin")
n 
8  
> size.anova(model="a>b>c",hypothesis="a",a=3,b=3,c=4, alpha=0.01,beta=0.1,delta=0.5,case="maximin") n 
12  
> size.anova(model="a>b>c",hypothesis="b",a=3,b=3,c=4, alpha=0.01,beta=0.1,delta=0.5,case="minimin") n 
1 
> size.anova(model="a>b>c",hypothesis="c",a=3,b=3,c=4, alpha=0.01,beta=0.1,delta=0.5,case="minimin") n 
18
> size.anova(model="a>b>c",hypothesis="c",a=3,b=3,c=4,  alpha=0.01,beta=0.1,delta=0.5,case="maximin") n 
302

As we can see, the minimin sizes and the maximin sizes differ more for the nested factors.

5.5.3 Mixed Classifications

In experiments with three or more factors besides a cross‐classification or a nested classification, we often find a further type of classification, the so‐called mixed (partially nested) classifications. In the three‐way ANOVA, two mixed classifications occur (Rasch 1971). We consider the case that the birth weight of piglets is observed in a three‐way classification with factors boar, sow within boar and gender of the piglet. The latter is cross‐classified with the nested factors boar and sow.

5.5.3.1 Cross‐Classification between Two Factors where One of Them Is Sub‐Ordinated to a Third Factor ((B ≺ A)xC)

If in a balanced experiment a factor B is sub‐ordinated to a factor A and both are cross‐classified with a factor C then the corresponding model equation is given by

5.34

In 5.34 μ is the general experimental mean, a_i is the effect of the ith level of factor A; b_ij is the effect of the jth level of factor B within the ith level of factor A; c_k is the effect of the kth level of factor C. Further (a, c)_ik and (b, c)_jk(i) are the corresponding interaction effects and e_ijkl are the random error terms.

As usual, the error terms e_ijkl are independently distributed with expectation zero and the same variance σ²; for testing and confidence estimation normality is assumed in addition.

Model 5.34 is considered under the side conditions for all indices not occurring in the summation

5.35

and

5.36

(for all i,j, k, l).

The observations

are allocated as shown in Table 5.21 (we restrict ourselves to the so‐called balanced case where the number of B levels is equal for all A levels and the subclass numbers are equal).

Table 5.21 Observations of a mixed classification type (A≻B) × C with a = 2, b = 3, c = 2, n = 2.

	A₁			A₂
	B₁₁	B₁₂	B₁₃	B₂₁	B₂₂	B₂₃
C₁	288 295	355 369	329 343	310 282	303 321	299 328
C₂	278 272	336 342	320 315	288 287	302 297	289 284

Example 5.20

In Table 5.21 the arrangement of observations in a mixed classification of type (A≻B) × C is shown. How to analyse such data is demonstrated in Problem 5.14. As an example, we consider testing pig fattening for male and female (factor C) offspring of sows (factor B) nested in boars (factor A). The observed character is the number of fattening days an animal needed to grow up from 40 kg to 110 kg.

For the sum of squared deviations of the random variables

from their arithmetic mean

we have

where

is the SS between the levels of A,

is the SS between the levels of B within the levels of A,

is the SS between the levels of C,

is the SS for the interactions A × C,

is the SS for the interactions B × C within the levels of A and

the SS within the classes. The N − 1 degrees of freedom of SS_T corresponding to the components of SS_T can be split into six components. These components are given in Table 5.22. In the third column of Table 5.22 we find the MS obtained from the SS by division with the degrees of freedom.

The hypothesis H_0A : a_i = 0 can be tested by help of the statistic which, under H_0A, is F‐distributed with a – 1 and N − abc degrees of freedom. In Table 5.22 we see that in our model, we test the hypothesis over all effects (a_i, b_ij, … , (a, b, c)_ijk) by using the ratios of the corresponding MS and MS_res as test statistic.

Problem 5.14

Calculate the empirical ANOVA table and perform all possible F‐tests for Example 5.20.

Solution

Make in R the data‐frame for the observations of Table 5.21 and use >lm()

Example

 > y1 <- c(288,295,278,272,355,369,336,342,329,343,320,315)
> y2 <- c(310,282,288,287,303,321,302,297,299,328,289,284)
> a <- c(rep(1,12), rep(2,12))
> b1 <- c(rep(11,4), rep(12,4), rep(13,4))
> b2 <- c(rep(21,4), rep(22,4), rep(23,4))
> c1 <‐ c(rep(1,2),rep(2,2),rep(1,2),rep(2,2),rep(1,2),rep(2,2))
>c2 <- c(rep(1,2),rep(2,2),rep(1,2),rep(2,2),rep(1,2),rep(2,2))
> y <- c(y1,y2)
> b <- c(b1,b2)
> c <- c(c1,c2)
> T_5_20 <- data.frame(cbind(y,a,b,c))
> A <- factor(a)
> B <- factor(b)
> C <- factor(c)
> Anova <- lm ( y ∼A + A/B + C + A*C + (A/B)*C )
> summary(Anova)
Call:
lm(formula = y ∼ A + A/B + C + A * C + (A/B) * C)
Residuals:
    Min      1Q  Median      3Q     Max
-14.500  -3.125   0.000   3.125  14.500
Coefficients: (12 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  291.500      7.272  40.088 3.75e-14 ***
A2            22.000     10.283   2.139 0.053653 .
C2           -16.500     10.283  -1.605 0.134580
A1:B12        70.500     10.283   6.856 1.76e-05 ***
A2:B12            NA         NA      NA       NA
A1:B13        44.500     10.283   4.327 0.000983 ***
A2:B13            NA         NA      NA       NA
A1:B21            NA         NA      NA       NA
A2:B21       -17.500     10.283  -1.702 0.114542
A1:B22            NA         NA      NA       NA
A2:B22        -1.500     10.283  -0.146 0.886450
A1:B23            NA         NA      NA       NA
A2:B23            NA         NA      NA       NA
A2:C2        -10.500     14.543  -0.722 0.484130
A1:B12:C2     -6.500     14.543  -0.447 0.662872
A2:B12:C2         NA         NA      NA       NA
A1:B13:C2     -2.000     14.543  -0.138 0.892898
A2:B13:C2         NA         NA      NA       NA
A1:B21:C2         NA         NA      NA       NA
A2:B21:C2     18.500     14.543   1.272 0.227441
A1:B22:C2         NA         NA      NA       NA
A2:B22:C2     14.500     14.543   0.997 0.338426
A1:B23:C2         NA         NA      NA       NA
A2:B23:C2         NA         NA      NA       NA
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1
Residual standard error: 10.28 on 12 degrees of freedom
Multiple R-squared:  0.9193,    Adjusted R-squared:  0.8453
F-statistic: 12.42 on 11 and 12 DF,  p-value: 6.34e-05
> anova(Anova)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
A          1 2646.0 2646.00 25.0213 0.0003082 ***
C          1 1872.7 1872.67 17.7084 0.0012142 **
A:B        4 9701.3 2425.33 22.9346 1.511e-05 ***
A:C        1   16.7   16.67  0.1576 0.6983418
A:B:C      4  211.7   52.92  0.5004 0.7362183
Residuals 12 1269.0  105.75
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1

How to determine the sample sizes is shown in Problem 5.15.

Problem 5.15

Determine the minimin and maximin sample sizes for testing H_A0 : a_i = 0 (for all i).

Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use the OPDOE package > size.anova(model = "[a > b]xc", hypothesis = "a",a=, b=, c=, alpha=, beta = 2, delta = 5, case = "minimin")

or case = "maximin".

Example

Here and in the following problems we use a = 5, b = 4, c = 2, α = 0.05, β = 0.2, δ = 0.5.

 > size.anova(model="(a> b)xc", hypothesis="a",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="minimin")
n 
5
> size.anova(model="(a> b)xc", hypothesis="a",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="maximin") n 
13

Problem 5.16

Determine the minimin and maximin sample sizes for testing H_C0 : c_k = 0 (for all k).

Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use the OPDOE package >size.anova(model = "[a > b]xc", hypothesis = "c",a=, b=, c=, alpha=, beta=, delta=, case = "minimin")

or case = "maximin".

Example

 > size.anova(model="(a>b)xc", hypothesis="c",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="minimin")
n 
4
> size.anova(model="(a>b)xc", hypothesis="c",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="maximin")
n 
4

Problem 5.17

Determine the minimin and maximin sample sizes for testing H_A × C0 : (ac)_ik = 0 (for all i and k).

Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use the OPDOE package

  > size.anova(model="(a>b)xc", hypothesis="axc",a=, b=, c=, alpha=, beta=, delta=, case="minimin")

or case = "maximin".

Example

 > size.anova(model="(a>b)xc", hypothesis="axc",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="minimin")
n 
8
> size.anova(model="(axb)>c", hypothesis="axb",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="maximin") n 
70

5.5.3.2 Cross‐Classification of Two Factors, in which a Third Factor is Nested (C ≺ (A × B))

If two cross‐classified factors (A × B) are super‐ordered to a third factor (C) we have another mixed classification. The model equation for the random observations in a balanced design is given by

5.37

This is again the situation of model I; the error terms e_ijkl again fulfil condition 5.36.

We assume that for all values of the indices not occurring in the summation, we have the side conditions

5.38

The total sum of squared deviations can be split into components

with

the SS between the A levels,

the SS between the B levels,

the SS between the C levels within the A × B combinations,

the SS for the interactions between factor A and factor B, and

The expectations of the MS in this model are shown in Table 5.23 and the hypotheses

H_A0 : a_i = 0, H_B0 : b_j = 0, H_C0 : c_ijk = 0, H_AB0 : (a, b)_ij = 0,

where the zero values are assumed to hold for all indices used in the hypotheses, can be tested by using the corresponding F‐statistic as the ratios of MS_A, MS_B, MS_C, and MS_A × B, respectively, (as numerator) and MS_res (as denominator).

Problem 5.18

Calculate the ANOVA table and the F‐tests for Example 5.21.

Solution

Make a data‐frame of the data of Table 5.24 and use in R > lm().

Example

 > y1 <- c(58,60,65,62,55,57,68,65)
> y2 <- c(62,61,71,73,63,62,68,72)
> y3 <- c(58,61,59,70,59,63,65,67)
> a1 <- c(rep(1,2),rep(2,2),rep(1,2),rep(2,2),rep(1,2),rep(2,2))
> a2 <- c(rep(1,2),rep(2,2),rep(1,2),rep(2,2),rep(1,2),rep(2,2))
> b <- c(rep(1,8), rep(2,8),rep(3,8))
> c1 <-c(rep(111,4),rep(112,4))
> c2 <-c(rep(121,4),rep(122,4))
> c3 <-c(rep(131,4),rep(132,4))
> y <- c(y1,y2,y3)
> a <- c(a1,a2)
> c <- c(c1,c2,c3)
> A <- factor(a)
> B <- factor(b)
> C <- factor(c)
> Anova <- lm ( y ∼A + B + (A*B)/C + A*B )
> anova(Anova)
Analysis of Variance Table
Response: y
          Df  Sum Sq Mean Sq F value    Pr(>F)
A          1 308.167 308.167 37.3535 5.242e-05 ***
B          2 117.000  58.500  7.0909   0.00927 **
A:B        2  16.333   8.167  0.9899   0.40002
A:B:C      6  27.500   4.583  0.5556   0.75746
Residuals 12  99.000   8.250
- - -
Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "" 1

Now let us determine sample sizes.

Problem 5.19

Determine the minimin and maximin sample sizes for testing H_A0 : a_i = 0 (for all i).

Unfortunately delta in the program below stands for τ=δ/σ.

Solution

Use the OPDOE package command > size.anova(model = "(axb) > c", hypothesis = "a",a=, b=, c=, alpha=, beta=, delta=, case = "minimin")

or case = "maximin".

Example

 > size.anova(model="(axb)>c", hypothesis="a",a=6, b=5, c=4, alpha=0.05, beta=0.1, delta=0.5, case="minimin")
n 
3
> size.anova(model="(axb)>c", hypothesis="a",a=6, b=5, c=4,  alpha=0.05, beta=0.1, delta=0.5, case="maximin")
n 
7

Problem 5.20

Determine the minimin and maximin sample sizes for testing H_A × C0 : (ac)_ik = 0 (for all i and k).

Unfortunately delta in the program below stands for τ = δ/σ.

Solution

Use the OPDOE package with command

 > size.anova(model="(axb)>c", hypothesis="axb",a=, b=, c=,  alpha=, beta=, delta=, case="minimin")

or case = "maximin".

Example

 > size.anova(model="(a>b)xc", hypothesis="axc",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="minimin")
n 
5
> size.anova(model="(axb)>c", hypothesis="axb",a=5, b=4, c=2, alpha=0.05, beta=0.2, delta=0.5, case="maximin") n 
70

References

Fisher, R.A. and Mackenzie, W.A. (1923). Studies in crop variation. II. The manurial response of different potato varieties. Journal of Agricultural Sciences 13: 311–320.
Hartung, J., Elpelt, B., and Voet, B. (1997). Modellkatalog Varianzanalyse. München: Oldenburg Verlag.
Kuehl, R.O. (1994). Statistical Principles of Research Design and Analysis. Belmont, California: Duxbury Press.
Lenth, R.V. (1986). Computing non‐central Beta probabilities. Appl. Statistics 36: 241–243.
Rasch, D. (1971). Mixed classification the three‐way analysis of variance. Biom. Z. 13: 1–20.
Rasch, D. and Schott, D. (2018). Mathematical Statistics. Oxford: Wiley.
Rasch, D., Wang, M., and Herrendörfer, G. (1997). Determination of the size of an experiment for the F‐test in the analysis of variance. Model I. In: Advances in Statistical Software 6. The 9th Conference on the Scientific Use of Statistical Software. Heidelberg: Springer.
Rasch, D., Herrendörfer, G., Bock, J., Victor, N., and Guiard, V. Hrsg. (2008). Verfahrensbibliothek Versuchsplanung und ‐ auswertung, 2. verbesserte Auflage in einem Band mit CD. R. Oldenbourg Verlag München Wien.
Rasch, D., Pilz, J., Verdooren, R., and Gebhardt, A. (2011). Optimal Experimental Design with R. Boca Raton: Chapman and Hall.
Scheffé, H. (1959). The Analysis of Variance. New York, Hoboken: Wiley.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

		Sex
Male	Female
Test periods	1	91 84 86	99
2	94 92 90 96	97 89
3	82 86	‐

Source of variation	SS	df
Between A levels		a − 1
Between B levels		b − 1
Between C levels		c − 1
Interaction A × B		(a − 1)(b − 1)
Interaction A × C		(a − 1)(c − 1)
Interaction B × C		(b − 1)(c − 1)
Interaction A × B × C		(a − 1)(b − 1)(c − 1)
Within the classes (residual), (5.30)		abc(n − 1)
Within the classes (residual), (5.31)		(a − 1)(b − 1) (c − 1) + abc(n − 1)
MS in (5.30)	E(MS) in (5.30)	F in (5.30)







	σ²

	B₁		B₂		B₃
	C₁₁₁	C₁₁₂	C₁₂₁	C₁₂₂	C₁₃₁	C₁₃₂
A₁	58 60	55 57	62 61	63 62	58 61	59 63
A₂	65 62	68 65	71 73	68 72	59 70	65 67

Table of Contents for 5 Analysis of Variance (ANOVA) – Fixed Effects Models

Create new playlist

Sign In

Sign Up

5.1 Introduction

5.1.1 Remarks about Program Packages

5.2 Planning the Size of an Experiment

5.3 One‐Way Analysis of Variance

5.3.1 Analysing Observations

5.3.2 Determination of the Size of an Experiment

5.4 Two‐Way Analysis of Variance

5.4.1 Cross‐Classification (A × B)

5.4.1.1 Parameter Estimation

5 Models with Interactions

5 Connected Incomplete Cross‐Classifications

5.4.1.2 Testing Hypotheses

5 Models without Interactions

5 Models with Interactions

5.4.2 Nested Classification (A≻B)

5.5 Three‐Way Classification

5.5.1 Complete Cross‐Classification (A×B × C)

5.5.2 Nested Classification (C ≺ B ≺ A)

5.5.3 Mixed Classifications

5.5.3.1 Cross‐Classification between Two Factors where One of Them Is Sub‐Ordinated to a Third Factor ((B ≺ A)xC)

5.5.3.2 Cross‐Classification of Two Factors, in which a Third Factor is Nested (C ≺ (A × B))

References

Table of Contents for
5 Analysis of Variance (ANOVA) – Fixed Effects Models