Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9
Analysis of Covariance (ANCOVA)

9.1 Introduction

Analysis of covariance combines elements of the analysis of variance and of regression analysis. Because the analysis of variance can be seen as a multiple linear regression analysis, the analysis of covariance (ANCOVA) can be defined as a multiple linear regression analysis in which there is at least one categorical explanatory variable and one quantitative variable. Usually the categorical variable is a treatment of primary interest measured at the experimental unit and is called response y. The quantitative variable x, which is also measured in experimental units in anticipation that it is associated linearly with the response of the treatment. This quantitative variable x is called a covariate or covariable or concomitant variable.

Before we explain this in general, we give some examples performed in a completely randomised design.

In a completely randomised design treatments were applied on tea bushes where the yields y_ij are the yields in kilograms of the tea bushes. An important source of error is that, by the luck of the draw by randomisation, some treatments will be allotted to a more productive set of bushes than others. Fisher described in 1925 in his first edition of “Statistical Methods for Research Workers” the application of the covariate x_ij, which was the yield in kilograms of the tea bushes in a period before treatments were applied. Since the relative yields of the tea bushes show a good deal of stability from year to year, x_ij serves as a linear predictor of the inherent yielding stabilities of the bushes. The regression lines of y_ij on x_ij are parallel regression lines for the treatments. By adjusting the treatment yields so as to remove these differences in yielding ability, we obtain a lower variance of the experimental error and more precise comparisons amongst the treatments. See Fisher (1935) section 49.1.
In variety trials with corn (Zea mays) or sugar beet (Beta vulgaris) usually plots with 30 plants are used. The weight per plot in kilograms is the response y_ij, but often some plants per plot are missing. As covariate, x_ij is taken as the number of plants per plot. The regression lines of y_ij on x_ij are parallel regression lines for the treatments.
One could perhaps analyse the yield per plant (y/x) in kilograms as a means of removing differences in plant numbers. This is satisfactory if the relation between y and x is a straight line through the origin. However, the regression line of y on x is a straight line not through the origin and the estimated regression coefficient b is often substantially less than the mean yield per plant because when plant numbers are high, competition between plants reduces the yield per plant. If this happens, the use of y/x overcorrects for the stand of plants. Of course, the yield per plant should be analysed if there is direct interest in this quantity.
A common clinical method to evaluate an individual's cardiovascular capacity is through treadmill exercise testing. One of the measures obtained during treadmill testing, maximal oxygen uptake, is the best index of work capacity and maximal cardiovascular function. As subjects, e.g., 12 healthy males who did not participate in a regular exercise program were chosen. Two treatments selected for the study were a 12‐week step aerobics training program and a 12‐week outdoor running regimen on a flat terrain. Six men were randomly assigned to each group in a completely randomised design. Various respiratory measurements were made on the subjects while on the treadmill before the 12‐week period. There were no differences in the respiratory measurements of the two groups of subjects prior to treatment. The measurement of interest y_ij is the change in of maximal ventilation (litres/minute) of oxygen for the 12‐week period. The relationship between maximal ventilation change and age (years) is linear and the regression lines for the two treatments are parallel regression lines. Hence as covariate x_ij is taken as the age of the subjects.
One wants to assess the strength of threads y_ij in pounds made by three different machines. Each thread is made from a batch of cotton, and some batches tend to form thicker thread than other batches. There is no way to know how thick it will be until you make it. Regardless of how the machines may affect thread strength, thicker threads are stronger. Thus, we record the diameter x_ij in 10⁻³ in. as a covariate. The regression lines of of y_ij on x_ij are parallel regression lines for the machines.

When one wants to use the ANCOVA it is a good idea to check whether the linear regression lines are parallel for the different treatments. If the regression lines are not parallel then we have an interaction between the treatments and the covariate. Hence it is a good idea to run first an ANCOVA model with interaction of treatments and covariate; if the slopes are not statistically different (no significant interaction), then we can use an ANCOVA model with parallel lines, which means that there is a separate intercepts regression model. The main use of ANCOVA is for testing a treatment effect while using a quantitative control variable as covariate to gain power.

Note that if needed the ANCOVA model can be extended with more covariates. The R program can handle this easily. If the regression of y is quadratic on the covariate x, we have then an ANCOVA model with two covariates, x₁ = x and x₂ = x².

9.2 Completely Randomised Design with Covariate

We first discuss the balanced design.

9.2.1 Balanced Completely Randomised Design

We assume that we have a balanced completely randomised design for a treatment A with a classes. Assuming further that there is a linear relationship between the response y and the covariate x, we find that an appropriate statistical model is

9.1

where μ is a constant, a_i is the treatment effect with the side condition _i = 0, β is the coefficient for the linear regression of y_ij on x_ij, and the e_ij are random independent normally distributed experimental errors with expectation 0 and variance σ².

Two additional key assumptions for this model are that the regression coefficient β is the same for all treatment groups (parallel regression lines) and the treatments do not influence the covariate x.

The first objective of the covariance analysis is to determine whether the addition of the covariate has reduced the estimate of experimental error variance. This means that the test of the null hypothesis H_β0: β = 0 against the alternative hypothesis H_βA: β ≠ 0 results in rejection of H_β0. If the reduction of the estimate of the experimental error variance is significant then we obtain estimates of the treatment group means μ + a_i adjusted to the same value of the covariate x for each of the treatment groups and determine the significance of treatment differences on the basis of the adjusted treatment means. Usually the statistical packages estimate the adjusted treatment means for the overall mean _.. of the covariate x.

The least squares estimates of the parameters of model (9.1) are:

see Montgomery (2013), section 15.3. For the derivation of the least squares estimates of the parameters see the general unbalanced completely randomised design in Section 9.2.2.

For the balanced case we have then that n_i for i = 1, …, a is equal to n.

The nested ANOVA table (note the sequence of the source of variation) for the test of the null hypothesis H_β0: β = 0 is given in Table 9.1.

images — Table 9.1 Nested ANOVA table for the test of the null hypothesis H_β0: β = 0.

The test of the null hypothesis H_β0: β = 0 against the alternative hypothesis H_βA: β ≠ 0 is done with the test‐statistic

9.2

which has under H_β0: β = 0 the F‐distribution with df = 1 for the numerator and df = a(n‐1)‐1 for the denominator.

For the test of the null hypothesis H_A0: ‘all a_i are equal’ against the alternative hypothesis H_AA: ‘at least one a_i is different from the other’ we use the nested ANOVA table given in Table 9.2.

Source of variation	df	SS
Regression coefficient	1	S_b =
Treatments	a − 1	S_T = SS_yy – S_b – SS_E
Error	a(n − 1) − 1	SS_E
Corrected Total	an – 1	SS_yy ⁼

The test of the null hypothesis H_A0: ‘all a_i are equal’ against the alternative hypothesis H_AA: ‘at least one a_i is different from the other’ is done with the test‐statistic

9.3

which has under H_0A: ‘all a_i are equal’ the F‐distribution with df = a − 1 for the numerator and df = a(n −1) − 1 for the denominator.

The estimate of the treatment mean adjusted for the covariate at x = overall mean _.. is

9.4

The estimate of the standard error of this estimate (9.2) is

9.5

The estimate for the difference between two adjusted treatment means at overall mean _.. is

9.6

However, we first want to check if we can use an ANCOVA model with parallel lines, which means that there is a separate intercepts regression model. Therefore, we run first an ANCOVA model with interaction of treatments and covariate.

An appropriate statistical model is

9.7

where μ is a constant, a_i is the treatment effect with the side condition _i = 0, β_i is the coefficient for the linear regression for treatment A with class a_i of y_ij on x_ij, and the e_ij are random independent normally distributed experimental errors with mean 0 and variance σ².

In R we would make an ANOVA table with this model and look for the test of the interaction effect of treatment and covariate whether we must reject the null hypothesis H_0βi: β₁ = … = β_a and accept the alternative hypothesis H_aβi: ‘there is at least one β_i different from another β_j with i ≠ j ’. If this is the case we cannot use the ANCOVA model.

Example 9.1

From Montgomery (2013) we take his Example 15.5. Consider a study performed to determine if there is a difference in the strength of a monofilament fibre produced by three different machines M. The response is the breaking strength of the fibre y (in pounds) and the diameter of the fibre x (in 10⁻³ in.). The data from this experiment are shown in Table 9.3.

Table 9.3 Strength of a monofilament fibre produced by three different machines M.

M₁		M₂		M₃
y	x	y	x	y	x
36	20	40	22	35	21
41	25	48	28	37	23
39	24	39	22	42	26
42	25	45	30	34	21
49	32	44	28	32	15

Note: This experiment is described in Section 9.1 as example 4.

Problem 9.1

Test the null hypothesis H_β0: β₁ = … = β_a against the alternative hypothesis H_BA: ‘there is at least one β_i different from another β_j with i ≠ j’ with a significance level α = 0.05 using model (9.7) in Example 9.1.

Solution

In R we use the command

  > fit.seplines <- lm(strength ∼ machine * diameter, data = example9_1)

Example

 > strength <- c(36,41,39,42,49,40,48,39,45,44,35,37,42,34,32)
> diameter <- c(20,25,24,25,32,22,28,22,30,28,21,23,26,21,15)
> machine <- factor(rep(1:3, each = 5))
> plot(diameter, strength, pch = as.character(machine))
See Figure 9.1.
> example9_1 <- data.frame(machine, strength,diameter)
> fit.seplines <- lm(strength ∼ machine * diameter , data = example9_1)
> anova(fit.seplines)
Analysis of Variance Table
Response: strength
                 Df  Sum Sq Mean Sq F value    Pr(>F)
machine           2 140.400  70.200 25.0231 0.0002107 ***
diameter          1 178.014 178.014 63.4538 2.291e-05 ***
machine:diameter  2   2.737   1.369  0.4878 0.6292895
Residuals         9  25.249   2.805
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Because the interaction machine × diameter has the p‐value Pr(>F) = 0.6292895 > 0.05 we cannot reject the null hypothesis H_B0: β₁ = … = β_a and we can use the ANCOVA model (9.1) with parallel lines for the regression lines of strength on diameter for the machines.

Figure 9.1 Scatter‐plot of the example in Problem 9.1 with M1 as 1, M2 as 2, M3 as 3.

Problem 9.2

Test in Example 9.1 with the ANCOVA model (9.1) the null hypothesis H_β0: β = 0 against the alternative hypothesis H_BA: β ≠ 0 with significance level α = 0.05.

Solution

In R we use the command

 > testbeta0 <- lm ( strength ∼machine  + diameter , data = example9_1)

Example

We continue with the R program of Problem 9.1.

 > testbeta0 <- lm ( strength ∼machine  + diameter , data = example9_1)
> anova(testbeta0)
Analysis of Variance Table
Response: strength
          Df  Sum Sq Mean Sq F value    Pr(>F)
machine    2 140.400  70.200  27.593 5.170e-05 ***
diameter   1 178.014 178.014  69.969 4.264e-06 ***
Residuals 11  27.986   2.544
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(testbeta0)
Call:
lm(formula = strength ∼ machine + diameter, data = example9_1)
Residuals:
    Min      1Q  Median      3Q     Max
-2.0160 -0.9586 -0.3841  0.9518  2.8920
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   17.360      2.961   5.862 0.000109 ***
machine2       1.037      1.013   1.024 0.328012
machine3      -1.584      1.107  -1.431 0.180292
diameter       0.954      0.114   8.365 4.26e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.595 on 11 degrees of freedom
Multiple R-squared:  0.9192,    Adjusted R-squared:  0.8972
F-statistic: 41.72 on 3 and 11 DF,  p-value: 2.665e-06

Because in the ANOVA table with diameter as the last model variable we find the p‐value Pr(>F) = 4.264e‐06 < 0.05 we reject the null hypothesis H_B0: β = 0.

Of course this can also be concluded from the t‐test for diameter where we find the same p‐value Pr(>|t|) = 4.26e‐06 < 0.05.

Problem 9.3

Test in Example 9.1 with the ANCOVA model (9.1) the null hypothesis H_A0: ‘all a_i are equal’ against the alternative hypothesis H_AA: ‘at least one a_i is different from the other’ with significance level α = 0.05.

Solution

In R we use the command

 > machine0 <- lm ( strength ∼ diameter+ machine , data = example9_1)

Example

We continue with the R program of Problem 9.1.

 > machine0 <- lm ( strength ∼ diameter+ machine , data = example9_1)
> anova(machine0)
Analysis of Variance Table
Response: strength
          Df  Sum Sq Mean Sq  F value   Pr(>F)
diameter   1 305.130 305.130 119.9330 2.96e-07 ***
machine    2  13.284   6.642   2.6106   0.1181
Residuals 11  27.986   2.544
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(machine0)

Call:
lm(formula = strength ∼ diameter + machine, data = example9_1)
Residuals:
    Min      1Q  Median      3Q     Max
-2.0160 -0.9586 -0.3841  0.9518  2.8920
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   17.360      2.961   5.862 0.000109 ***
diameter       0.954      0.114   8.365 4.26e-06 ***
machine2       1.037      1.013   1.024 0.328012
machine3      -1.584      1.107  -1.431 0.180292
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.595 on 11 degrees of freedom
Multiple R-squared:  0.9192,    Adjusted R-squared:  0.8972
F-statistic: 41.72 on 3 and 11 DF,  p-value: 2.665e-06

From the ANOVA table with machine as the last model variable we find the p‐value

Pr(>F) = 0.1181 > 0.05 we cannot reject the null hypothesis H_A0: ‘all a_i are equal’.

Problem 9.4

Estimate the adjusted machine means at the overall mean of the diameter.

Solution

Look at the outcomes of summary(beta0) in Problem 9.2; from this we find the estimate of β as diameter 0.954. Apply then formula (9.4).

Example

We calculate first the means of strength and diameter per machine and the overall mean of strength and diameter.

 > tapply(strength, machine, mean) 1    2    3
41.4 43.2 36.0 
> tapply(diameter, machine, mean) 1    2    3
25.2 26.0 21.2 
> mean.strength <- mean(strength)
> mean.strength
[1] 40.2
> mean.diameter <- mean(diameter)
> mean.diameter
[1] 24.13333
> mean.strength1 <- 41.4
> mean.strength2 <- 43.2
> mean.strength3 <- 36.0
> mean.diameter1 <- 25.2
> mean.diameter2 <- 26.0
> mean.diameter3 <- 21.2
> estimate.beta <- 0.954
> adj.meanM1 <- mean.strength1-estimate.beta*(mean.diameter1-mean.diameter)
> adj.meanM1
[1] 40.3824
> adj.meanM2 <- mean.strength2-estimate.beta*(mean.diameter2-mean.diameter)
> adj.meanM2
[1] 41.4192
> adj.meanM3 <- mean.strength3-estimate.beta*(mean.diameter3-mean.diameter)
> adj.meanM3
[1] 38.7984

Hence the adjusted mean of M₁ is 40.3824; the adjusted mean of M₂ is 41.4192, and the adjusted mean of M₃ is 38.7984.

Note that the difference in the adjusted means of M₂ – M₁ is 41.4192–4140.3824 = 1.0368 and this is given in Problem 9.3 in the summary(machine0) as the Estimatemachine2 1.037 with the estimate of the standard error 1.013. The difference in the adjusted means of M₃ – M₁ is 38.7984–7940.3824 = –1.584 and this isgiven in Problem 9.3 in the summary(machine0) as the Estimate machine3 −1.584 with the estimate of the standard error 1.107.

Problem 9.5

Draw the estimated regression lines for the machines M₁, M₂, and M₃.

Solution

The regression line for machine M_i goes through the point (_i., _i.), the sample mean of the x‐values and the y‐values of M_i. The estimate of β is b = 0.954 as given in Problem 9.2 in the summary(testbeta0) as diameter 0.954 with the estimate of its standard error as 0.114.

The estimated regression line for M_i. is y_ij = I_i + bx_ij hence the intercepts are I_i = _i. – b _i..

Example

Continuation with the results of the R program of Problems 9.3 and 9.4 we find the intercepts I_i.

 > intercept_I1 <- mean.strength1 - estimate.beta*mean.diameter1
> intercept_I1
[1] 17.3592
> intercept_I2 <- mean.strength2 - estimate.beta*mean.diameter2
> intercept_I2
[1] 18.396
> intercept_I3 <- mean.strength3 - estimate.beta*mean.diameter3
> intercept_I3
[1] 15.7752

Using y for strength and x for diameter we have the regression lines for the machines: the regression line for M₁ is y = 17.3592 + 0.954x; the regression line for M₂ is y = 18.396 + 0.954x; the regression line for M₃ is y = 15.7752 + 0.954x.

Note that in the output of Problem 9.3 the output of summary(machine0) we find the estimate(intercept) 17.360, which is the intercept I₁; further we find machine2 1.037 and machine3−1.584.

The intercept I₂ is (intercept) + machine2 = 17.360 + 1.037 = 18.397 and the intercept I₃ is (intercept) + machine3 = 17.360 + (−1.584) = 15.776.

In the rationale in Problem 9.4 we have found that the difference in the adjusted means of M₂ – M₁ is 41.4192 – 40.3824 = 1.0368 and this is given in Problem 9.3 in the summary(machine0 as the Estimate machine2 1.037).

But the difference of the adjusted means of M₂ – M₁ is according to (9.4) [_2. ‐ (_2. ‐ _..)] – [_1. ‐ (_1. ‐ _..)] = [_2. − (_2.)] − [_1. − (_1.)] = I₂ – I₁.

Analogous to the difference of the adjusted means of M₃ – M₁ is 38.7984 – 40.3824 = –1.584 and this is given in Problem 9.3 in the summary (machine0) as the Estimate machine3 −1.584. However, the difference in the adjusted means of M₃ – M₁ is, according to (9.4), I₃ – I₁.

Problem 9.6

Give the difference in the adjusted means M₁ – M₂, M₁ – M₃ andM₂ – M₃. Give for these expected differences the (1–0.05)‐confidence limits.

Solution

Look at the outcomes of summary(beta0) in Problem 9.2; from this we find in the summary(testbeta0) the estimate of β as diameter 0.954 and the estimate of its standard error as standard error = 0.114.

Look further at the outcomes of summary(machine0) in Problem 9.3. From this we find Estimatemachine2 1.037 with standard error 1.013; further, Estimate machine3 −1.584 with standard error 1.107. In Problem 9.5 we have already seen that the difference om the adjusted means of M₂ – M₁ is 1.037, hence the difference in the adjusted means of M₁ – M₂ is –1.037 with the estimate of its standard error 1.013. Analogous to the difference in the adjusted means of M₃ – M₁ is −1.584, hence the difference of the adjusted means of M₁ – M₃ is 1.584 with the estimate of its standard error 1.107. For the difference in the adjusted means of M₂ – M₃ we must calculated it from the results of Problem 9.4 and the estimate of its standard error with (9.6).

The mean square error and its df is given in Problem 9.2 in anova(testbeta0) as residuals 11 27.986 2.544 hence MSE = 2.544 with df = 11. Of course this result is also given in Problem 9.3 in anova(machine0). It is better to use more decimals with MS = 27.986/11.

Example

 > df <- 11
> MSE <- 27.986/11
> MSE
[1] 2.544182
> tvalue <- qt(0.975, df)
> tvalue
[1] 2.200985
> adj.meanM1 <- 40.3824
> adj.meanM2 <- 41.4192
> adj.meanM3 <- 38.7984
> diffM1M2 <- adj.meanM1 - adj.meanM2
> diffM1M2
[1] -1.0368
> diffM1M3 <- adj.meanM1-adj.meanM3
> diffM1M3
[1] 1.584
> diffM2M3 <- adj.meanM2 - adj.meanM3
> diffM2M3
[1] 2.6208
> SEM1M2 <- 1.013
> SEM1M3 <- 1.107
>  diameter <- c(20,25,24,25,32,22,28,22,30,28,21,23,26,21,15)
> machine <- factor(rep(1:3, each = 5))
>  tapply(diameter, machine, mean) 1    2    3 
25.2 26.0 21.2
> mean.diameter1 <- 25.2
> mean.diameter2 <- 26.0
> mean.diameter3 <- 21.2
>  tapply(diameter, machine, sum) 1   2   3 
126 130 106
> sum.diameter1 <- 126
> sum.diameter2 <- 130
> sum.diameter3 <- 106
> A <- sum(diameter^2)
> A
[1] 8998
> B <- (sum.diameter1^2 + sum.diameter2^2 + sum.diameter3^2)/5
> B
[1] 8802.4
> Exx <- A-B
> Exx
[1] 195.6
> n <- 5
> SEM2M3 <- sqrt(MSE*(2/n + (mean.diameter2-mean.diameter3)^2/Exx))
> SEM2M3 [1] 1.147761
> CLlowerM1M2 <- diffM1M2 - SEM1M2*tvalue
> CLlowerM1M2
[1] -3.266398
> CLupperM1M2 <- diffM1M2 + SEM1M2*tvalue
> CLupperM1M2
[1] 1.192798
> CLlowerM1M3 <- diffM1M3 - SEM1M3*tvalue
> CLlowerM1M3
[1] -0.8524906
> CLupperM1M3 <- diffM1M3 + SEM1M3*tvalue
> CLupperM1M3
[1] 4.020491
> CLlowerM2M3 <- diffM2M3 - SEM2M3*tvalue
> CLlowerM2M3
[1] 0.0945949
> CLupperM2M3 <- diffM2M3 + SEM2M3*tvalue
> CLupperM2M3
[1] 5.147005

The 0.95‐confidence interval for α₁ – α₂ is: [−3.266398; 1.192798]

The 0.95‐confidence interval for α₁ – α₃ is: [−0.8524906; 4.020491]

The 0.95‐confidence interval for α₂ – α₃ is: [0.0945949; 5.147005]

9.2.2 Unbalanced Completely Randomised Design

Now we will give the ANCOVA for an unbalanced completely randomised design for a treatment A with a classes. Assuming further that there is a linear relationship between the response y and the covariate x, we find that an appropriate statistical model is

9.8

where μ is a constant, a_i are the treatment effects with the side condition _i = 0, β is the coefficient for the linear regression of y_ij on x_ij, and the e_ij are random independent normally distributed experimental errors with mean 0 and variance σ².

The least squares estimator of the parameters in (9.8) are as follows.

Deriving the first partial derivatives of

with respect to the parameters and zeroing the result at which the parameter values in the equations are replaced by their estimates leads to the equations ():

9.9

From this, we obtain the estimates explicitly as:

9.10

Because

is a convex function the solution of the equation to set the partial derivatives equal to zero gives a minimum of S.

Replacing the realisations of the random variables in (9.8) by the corresponding random variables results in the least squares estimators

9.11

The estimators in (9.11) are best linear unbiased estimators (BLUE) and normally distributed.

The analysis for the unbalanced ANCOVA goes with R with the same commands, but the differences are in the estimator of β, the standard errors of the estimate of the adjusted treatment means and the standard errors of the differences between the estimates of the adjusted treatment means.

The estimate of β is = E_xx / E_yy with

which is also given in (9.10) with = b.

The estimates of the treatment means adjusted for the covariate at x = overall mean _.. is

9.12

The estimate of the standard error of this estimate (9.2) is

9.13

The estimate for the difference between two adjusted treatment means is

9.14

Example 9.2

From Walker (1997) we take his example 9.1 “Triglyceride changes adjusted for glycemic control”. A new cholesterol‐lowering supplement Fibralo, was studied in a randomised double‐blind study against a marketed reference agent, Gemfibrozil, in 34 non‐insulin dependent diabetic (NIDDM) patients. One of the study objectives was to compare the mean decrease in triglyceride levels (y in percent change) between groups. Degree of glycemic control, measured by haemoglobin A_1c levels (HbA_1c)x (in ng/ml) was thought to be an important factor in the responsiveness to the treatment. This covariate x was measured at the start of the study, and is shown with the percent changes in triglycerides from pretreatment to the end of the 10 week trial for the patients in the data given in Table 9.4.

Table 9.4 Data of a randomised double‐blind study.

Fibralo = group 1			Gemfibrozil = group 2
Patient No	x	y	Patient No	x	y
7.0	5		1	5.1	10
4	6.0	10	3	6.0	15
7	7.1	−5	5	7.2	−15
8	8.6	−20	6	6.4	5
11	6.3	0	9	5.5	10
13	7.5	−15	10	6.0	−15
16	6.6	10	12	5.6	−5
17	7.4	−10	14	5.5	−10
19	5.3	20	15	6.7	−20
21	6.5	−15	18	8.6	−40
23	6.2	5	20	6.4	−5
24	7.8	0	23	6.0	−10
27	8.5	−40	25	9.3	−40
28	9.2	−25	26	8.5	−20
30	5.0	25	29	7.9	−35
33	7.0	−10	31	7.4	0
			32	5.0	0
			34	6.5	−10

Problem 9.7

Test in Example 9.2 the null hypothesis H_B0: β₁ = … = β_a against the alternative hypothesis H_BA: ‘there is at least one β_i different from another β_j with i ≠ j’ with a significance level α = 0.05 using the model of (9.8).

Solution

In R we use the command

 > fit.seplines <- lm(y ∼ group + x + group * x , data = example9_2)

Example

 > x1 <- c(7.0,6.0,7.1,8.6,6.3,7.5,6.6,7.4,5.3,6.5,6.2,7.8,8.5,9.2,5.0,7.0)
> y1 <- c(5,10,-5,-20,0,-15,10,-10,20,-15,5,0,-40,-25,25,-10)
> x2 <- c(5.1,6.0,7.2,6.4,5.5,6.0,5.6,5.5,6.7,8.6,6.4,6.0,9.3,8.5, 7.9,7.4,5.0,6.5)
> y2 <- c(10,15,-15,5,10,-15,-5,-10,-20,-40,-5,-10,-40,-20,-35,0,0,-10)
> group <- rep(1:2, times= c(16,18))
> x <- c(x1,x2)
> y <- c(y1,y2)
> Group <- factor(group)
> example9_2 <- data.frame(Group,x,y)
> fit.seplines <- lm(y ∼ Group + x + Group * x , data = example9_2)
> anova(fit.seplines)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
Group      1  327.2   327.2  3.4535   0.07296.
x          1 5980.9  5980.9 63.1225 7.227e-09 ***
Group:x    1   61.2    61.2  0.6458   0.42794
Residuals 30 2842.5    94.7
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Because the interaction Group:x has the p‐value Pr(>F) = 0.42794 > 0.05 we cannot reject the null hypothesis H_B0: β₁ = … = β_a and we can use the ANCOVA model (9.8) with parallel lines for the regression lines of y on x diameter for the groups.

Problem 9.8

Test in Example 9.2 with the ANCOVA model (9.8) the null hypothesis H_B0: β = 0 against the alternative hypothesis H_BA: β ≠ 0 with significance level α = 0.05.

Solution

In R we use the command

 > testbeta0 <- lm ( y ∼ Group  + x , data = example9_2)

Example

We continue with the R program of Problem 9.7.

 > testbeta0 <- lm ( y ∼ Group  + x , data = example9_2)
> anova(testbeta0)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
Group      1  327.2   327.2  3.4934   0.07109.
x          1 5980.9  5980.9 63.8521 5.063e-09 ***
Residuals 31 2903.7    93.7
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Because in the ANOVA table with x as the last model variable we find the p‐value

Pr(>F) = 5.063e‐09 < 0.05 and we reject the null hypothesis H_B0: β = 0.

Of course this can also concluded from the t‐test for x where we find the same

p‐value Pr(>|t|) = 5.06e‐09 < 0.05.

 > summary(testbeta0)
Call:
lm(formula = y ∼ Group + x, data = example9_2)
Residuals:
     Min       1Q   Median       3Q      Max
-19.0353  -6.8607   0.1951   6.1214  18.7915
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   74.814     10.163   7.361 2.75e-08 ***
Group2       -10.222      3.363  -3.040  0.00478 **
x            -11.268      1.410  -7.991 5.06e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.678 on 31 degrees of freedom
Multiple R-squared:  0.6848,    Adjusted R-squared:  0.6644
F-statistic: 33.67 on 2 and 31 DF,  p-value: 1.692e-08

Problem 9.9

Test in Example 9.2 with the ANCOVA model (9.8) the null hypothesis H_A0: ‘all a_i are equal’ against the alternative hypothesis H_AA: ‘at least one a_i is different from the other’ with significance level α = 0.05.

Solution

In R we use the command

 > Group0 <- lm ( y ∼ x + Group , data = example9_2)

Example

We continue with the R program of Problem 9.7.

 > Group0 <- lm ( y ∼ x + Group , data = example9_2)
> anova(Group0)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
x          1 5442.7  5442.7 58.1068 1.353e-08 ***
Group      1  865.4   865.4  9.2387  0.004783 **
Residuals 31 2903.7    93.7
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(Group0)
Call:
lm(formula = y ∼ x + Group, data = example9_2)
Residuals:
     Min       1Q   Median       3Q      Max
-19.0353  -6.8607   0.1951   6.1214  18.7915
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   74.814     10.163   7.361 2.75e-08 ***
x            -11.268      1.410  -7.991 5.06e-09 ***
Group2       -10.222      3.363  -3.040  0.00478 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.678 on 31 degrees of freedom
Multiple R-squared:  0.6848,    Adjusted R-squared:  0.6644
F-statistic: 33.67 on 2 and 31 DF,  p-value: 1.692e-08

From the ANOVA table with the group as the last model variable we find the p‐value PR(>F) = 0.004783 < 0.05 and we reject the null hypothesis H_A0: ‘all a_i are equal’.

Hence, the mean triglyceride response adjusted for glycemic control differs between the two treatment groups.

Problem 9.10

Estimate the adjusted group means at the overall mean of x.

Solution

Look at the outcomes of summary(beta0) in Problem 9.8; from this we find the estimate of β as x: − 11.268. Apply then formula (9.12).

Example

We continue with the R program of Problems 9.8 and 9.9.

We calculate first the means of y and x per group and the overall mean of y and x.

 > tapply(y , Group, mean) 1         2  -4.06250 -10.27778 
> tapply(x , Group, mean) 1        2 
7.000000 6.644444 
> mean.y1 <- -4.06250
> mean.y2 <- -10.27778
> mean.x1 <- 7.000000
> mean.x2 <- 6.644444
> mean.y <- mean(y)
> mean.y
[1] -7.352941
> mean.x <- mean(x)
> mean.x
[1] 6.811765
> estimate.beta <- -11.268 
> adj.Group1 <- mean.y1-estimate.beta*(mean.x1-mean.x)
> adj.Group1
[1] -1.941465
> adj.Group2 <- mean.y2-estimate.beta*(mean.x2-mean.x)
> adj.Group2
[1] -12.16315

The adjusted mean at the overall mean of x of group 1 is −1.941465 and the adjusted mean of group 2 at the overall mean of x is −12.16315.

Problem 9.11

Give the difference in the adjusted means at the overall mean of x of group 1 – group 2.

Give for this expected difference the (0.95)‐confidence limits.

Solution

Look at the outcomes of summarybeta0) in Problem 9.8; from this we find in the summary(testbeta0) the estimate of β as x = −11.268 and the estimate of its standard error as Standard Error = 1.410.

Look further at the outcomes of summary(Group0) in Problem 9.9. From this we find (Intercept) 74.814 and Estimate Group2−10.222 with Standard Error 3.363.

The value of intercept = 74.814 is the adjusted treatment mean of group 1 at the value x = 0, mean.y1‐estimate.beta*mean.x1 = −4.06250 − (−11.268)* 7.000000 = 74.8135.

From Problem 9.10 we can calculate the difference in the adjusted means at the overall mean of x of group 2 – group 1 is −12.16315 (−1.941465), hence the difference of the adjusted means at the overall mean of x of group 2 – group 1 is −10.221685 with the estimate of its standard error 3.363. We see that this is already calculated in summary(Group0) in Problem 9.9. Hence, the difference in the adjusted means at the overall mean of x of group 1 – group 2 is 10.222 also with the estimate of its standard error 3.363.

We will control this estimate of its standard error using (9.14).

The nean square error and its df is given in Problem 9.8 in anova(testbeta0) as Residuals 312 903.7 93.7 hence MSE = 93.7 with df = 31. Of course this result is also given in Problem 9.9 in anova(Group0).However, it is better to get more decimals to use MSE = 2903.7/31.

Example

 > x1 <- c(7.0,6.0,7.1,8.6,6.3,7.5,6.6,7.4,5.3,6.5,6.2,7.8,8.5,9.2,5.0,7.0)
> x2 <- c(5.1,6.0,7.2,6.4,5.5,6.0,5.6,5.5,6.7,8.6,6.4,6.0,9.3,8.5,7.9,7.4,5.0,6.5)
> x <- c(x1,x2)
> n1 <- length(x1)
> n2 <- length(x2)
> n1
[1] 16
> n2
[1] 18
> mean.x1 <- mean(x1)
> mean.x2 <- mean(x2)
> sum.x1 <- sum(x1)
> sum.x2 <- sum(x2)
> MSE <- 2903.7/31
> MSE
[1] 93.66774
> df <- 31
> tvalue <- qt(0.975, df)
> tvalue
[1] 2.039513
> A <- sum(x^2)
> B <- (sum.x1^2)/n1 + (sum.x2^2)/n2
> Exx <- A-B
> Exx
[1] 47.10444
> SEG1G2 <- sqrt(MSE*(1/n1+1/n2+(mean.x1-mean.x2)^2/Exx))
> SEG1G2
[1] 3.362943
> diff.G1G2 <- 10.222
> CLlowerG1G2 <- diff.G1G2 - SEG1G2*tvalue
> CLlowerG1G2
[1] 3.363233
> CLupperG1G2 <- diff.G1G2 + SEG1G2*tvalue
> CLupperG1G2
[1] 17.08077

We see that the estimate of the standard error of the adjusted means at the overall mean of x of group 1 – group 2 is SEG1G2 = 3.363, which we have already found as 3.363 in summary(Group0) in Problem 9.9.

The 0.95‐confidence interval for the expected means at the overall mean of x of group 1 – group 2 is: [3.363233; 17.08077].

Problem 9.12

Give the estimated regression lines for the groups Fibralo = group 1 and Gemfibrozil = group 2 and make a scatter plot of the data with the two regression lines.

Solution

The regression line for group i goes through the point (_i., _i.), the sample mean of the x values and the y values of group i. The estimate of β is b as x − 11.268, given in Problem 9.8 in the summary(testbeta0)and with the estimate of its standard error as 1.410.

The estimated regression line for group i is y_ij = I_i + bx_ij hence the intercept is I_i = _i. – b _i..

Example

Continuing with the results of the R program of Problem 9.8 we find the intercepts a_i.

Note that in the output of Problem 9.8 the output of summary(Group0) we find the Estimate (Intercept) 74.814 and Group2 −10.222.

The intercept a₁ is hence (Intercept) = 74.814

The intercept a₂ is hence (Intercept) + Group2 = 74.814 + (−10.222) = 64.592.

Using the results of Problem 9.10 we find:

 > estimate.beta <- -11.268
> mean.y1 <- -4.06250
> mean.y2 <- -10.27778
> mean.x1 <- 7.000000
> mean.x2 <- 6.644444
> I1 <- mean.y1 - estimate.beta* mean.x1
> I1
[1] 74.8135
> I2 <- mean.y2 - estimate.beta* mean.x2
> I2
[1] 64.59181

Hence the regression line for Fibralo = group 1 is y = 74.8135–11.268x and the regression line for Gemfibrozil = group 2 is y = 64.59181–11.268x.

The scatter plot with the regression lines is done as follows.

 > x1 <- c(7.0,6.0,7.1,8.6,6.3,7.5,6.6,7.4,5.3,6.5,6.2,7.8,8.5,9.2,5.0,7.0)
> y1 <- c(5,10,-5,-20,0,-15,10,-10,20,-15,5,0,-40,-25,25,-10)
> x2 <- c(5.1,6.0,7.2,6.4,5.5,6.0,5.6,5.5,6.7,8.6,6.4,6.0,9.3,8.5, 7.9,7.4,5.0,6.5)
> y2 <- c(10,15,-15,5,10,-15,-5,-10,-20,-40,-5,-10,-40,-20,-35,0,0,-10)
> Group <- rep(1:2, times= c(16,18))
> x <- c(x1,x2)
> y <- c(y1,y2)
> plot(x, y, main = "Group 1 ---, Group 2 ----", pch = as.character(Group))
> abline(I1,estimate.beta, lty=1, lwd=2)
> abline(I2,estimate.beta, lty=2, lwd=3)

See Figure 9.2.

Scatter-plot with regression lines of to test the null hypothesis against the alternative hypothesis of the ANCOVA model. — Figure 9.2 Scatter‐plot with regression lines of the example in Problem 9.2.

9.3 Randomised Complete Block Design with Covariate

We assume that we have a balanced randomised complete block design for a treatment A with a classes and b blocks. Assuming further that there is a linear relationship between the response y and the covariate x, we find that an appropriate statistical model is

9.15

where μ is a constant, a_i is the treatment effect with the side condition _i = 0, c_j is the block effect with the side condition _j = 0, β is the coefficient for the linear regression of y_ijk on x_ijk, and the e_ijk are random independent normally distributed experimental errors with mean 0 and variance σ².

The analysis is analogous to Section 9.2.1 for the balanced completely randomised design; only the ANOVA table additionally includes the effect of blocks.

We want first to check if we can use an ANCOVA model with parallel lines, which means that there is a separate intercepts regression model. Therefore, we run first an ANCOVA model with interaction of treatments and covariate.

An appropriate statistical model is

9.16

μ is a constant, a_i is the treatment effect with the side condition _i = 0, c_j is the block effect with the side condition _j = 0, β_i is the coefficient for the linear regression for treatment A with class A_i of y_ijk on x_ijk, and the e_ijk are random independent normally distributed experimental errors with mean 0 and variance σ².

In R make an ANOVA table for this model and look for the test of the interaction effect of treatment and covariate whether we must reject the null hypothesis H_B0: β₁ = … = β_a and accept the alternative hypothesis H_BA: ‘there is at least one β_i different from another β_j with i ≠ j’. If this is the case we cannot use the ANCOVA model.

Example 9.3

From Snedecor and Cochran (1989) we use example 18.4.2. In a randomised complete block design with four blocks (factor B), six varieties of corn (Zea mays) (factor V) are tested. The plots are planted with 30 plants of a variety, but during the growing season, several plants die. The yield of ear corn y (in pounds) per plot and the stand x (number of plants at harvesting per plot) are given in Table 9.5.

Table 9.5 Data of a randomised complete block design with four blocks (factor B) and six varieties of corn (Zea mays) (factor V).

Block (B)	B₁		B₂		B₃		B₄
Variety (V)	x	y	x	y	x	y	x	y
V₁	28	202	22	165	27	191	19	134
V₂	23	145	26	201	28	203	24	180
V₃	27	188	24	185	27	185	28	220
V₄	24	201	28	231	30	238	30	261
V₅	30	202	26	178	26	198	29	226
V₆	30	228	25	221	27	207	24	204

The experiment compared the yields y of six varieties of corn. There are some variation from plot to plot in number of plants (stand). If this variation is caused by differences in fertility in different plots and if higher plants result in higher yields per plot, increased precision will be obtained by adjusting for the covariance analysis of yield on plant number. The plant numbers in this event serve as an index of fertility levels of the plots. If some varieties characteristically have higher plant numbers than others through a greater ability to germinate or to survive when the plants are young, the adjustment for stand distorts the yields because it is trying to compare the varieties at some average plant number level that the varieties do not attain in practice.

With this in mind, look first at an ANOVA for varieties in x (stand). The ANOVA table give for varieties the MS(V) = 9.167 with df = 5 and for error the MS(E) = 7.589 with df = 15.

The F‐ratio = 1.208 has a p‐value = 0.352. The low value of F gives some assurance that the variations in stand are mostly random and that adjustment for stand will not introduce bias.

Note: this experiment is described in Section 9.1 as example 2.

Problem 9.13

Test in Example 9.3 the null hypothesis H_B0: β₁ = … = β_a against the alternative hypothesis H_BA: ‘there is at least one β_i different from another β_j with i ≠ j’ with a significance level α = 0.05 using the model (9.16).

Solution

In R we use the command

 > fit.seplines <- lm(y ∼ B + V + x + V:x , data = example9_3)

Example

 > x1 <- c(28,23,27,24,30,30)
> y1 <- c(202,145,188,201,202,228)
> x2 <- c(22,26,24,28,26,25)
> y2 <- c(165,201,185,231,178,221)
> x3 <- c(27,28,27,30,26,27)
> y3 <- c(191,203,185,238,198,207)
> x4 <- c(19,24,28,30,29,24)
> y4 <- c(134,180,220,261,226,204)
> x <- c(x1,x2,x3,x4)
> y <- c(y1,y2,y3,y4)
> b <- rep(1:4, each = 6)
> B <- factor(b)
> v1 <- c(1,2,3,4,5,6)
> v <- c(v1,v1,v1,v1)
> V <- factor(v)
> example9_3 <- data.frame(B, V, x, y)
> fit.seplines <- lm(y ∼ B + V + x + V:x , data = example9_3)
> anova(fit.seplines)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
B          3  436.2   145.4  1.3792 0.3106353
V          5 9490.0  1898.0 18.0049 0.0001896 ***
x          1 7391.0  7391.0 70.1131 1.535e-05 ***
V:x        5  412.5    82.5  0.7827 0.5867580
Residuals  9  948.7   105.4
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Because the interaction V:x has the p‐value Pr(>F) = 0.5867580 > 0.05 we cannot reject the null hypothesis H_B0: β₁ = … = β_a and we can use the ANCOVA model (9.15) with parallel lines for the regression lines of y on x for V.

Problem 9.14

Test in Example 9.3 with the ANCOVA model (9.15) the null hypothesis H_B0: β = 0 against the alternative hypothesis H_BA: β ≠ 0 with significance level α = 0.05.

Solution

In R we use the command

 > testbeta0 <- lm ( y ∼ B + V  + x , data = example9_3)

Example

We continue with the R program of Problem 9.13.

 > testbeta0 <- lm ( y ∼ B + V + x , data = example9_3)
> anova(testbeta0)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
B          3  436.2   145.4  1.4952     0.259
V          5 9490.0  1898.0 19.5198 7.313e-06 ***
x          1 7391.0  7391.0 76.0124 4.963e-07 ***
Residuals 14 1361.3    97.2
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(testbeta0)
Call:
lm(formula = y ∼ B + V + x, data = example9_3)
Residuals:
     Min       1Q   Median       3Q      Max
-15.3829  -6.3924  -0.1922   4.4051  16.5853
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.1765    23.5836  -1.322  0.20738
B2           17.2727     5.9399   2.908  0.01146 *
B3            5.3044     5.7118   0.929  0.36880
B4           20.5771     5.8250   3.533  0.00331 **
V2           -0.8223     7.0677  -0.116  0.90903
V3            1.3554     7.3455   0.185  0.85625
V4           27.5187     7.8920   3.487  0.00363 **
V5           -2.2169     7.7865  -0.285  0.78004
V6           21.8554     7.3455   2.975  0.01003 *
x             8.0578     0.9242   8.719 4.96e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.861 on 14 degrees of freedom
Multiple R-squared:  0.9271,    Adjusted R-squared:  0.8803
F-statistic: 19.79 on 9 and 14 DF,  p-value: 1.761e-06

Because in the ANOVA table with x as the last model variable we find the p‐value

Pr(>F) = 4.963e‐07 < 0.05, we reject the null hypothesis H_B0: β = 0.

Of course this can also concluded from the t‐test for x where we find the same

p‐value Pr(>|t|) = 4.96e‐07 < 0.05.

Problem 9.15

Test in Example 9.3 with the ANCOVA model (9.15) the null hypothesis H_A0: ‘all a_i are equal’ against the alternative hypothesis H_AA: ‘at least one a_i is different from the other’ with significance level α = 0.05.

Solution

In R we use the command

 > V0 <- lm ( y ∼ x + B + V , data = example9_3)

Example

We continue with the R program of Problem 9.14.

 > V0 <- lm ( y ∼ x + B + V , data = example9_3)
> anova(V0)
Analysis of Variance Table
Response: y
          Df  Sum Sq Mean Sq  F value    Pr(>F)
x          1 12161.2 12161.2 125.0702 2.298e-08 ***
B          3  1928.8   642.9   6.6121  0.005207 **
V          5  3227.3   645.5   6.6381  0.002296 **
Residuals 14  1361.3    97.2
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(V0)
Call:
lm(formula = y ∼ x + B + V, data = example9_3)
Residuals:
     Min       1Q   Median       3Q      Max
-15.3829  -6.3924  -0.1922   4.4051  16.5853
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.1765    23.5836  -1.322  0.20738
x             8.0578     0.9242   8.719 4.96e-07 ***
B2           17.2727     5.9399   2.908  0.01146 *
B3            5.3044     5.7118   0.929  0.36880
B4           20.5771     5.8250   3.533  0.00331 **
V2           -0.8223     7.0677  -0.116  0.90903
V3            1.3554     7.3455   0.185  0.85625
V4           27.5187     7.8920   3.487  0.00363 **
V5           -2.2169     7.7865  -0.285  0.78004
V6           21.8554     7.3455   2.975  0.01003 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.861 on 14 degrees of freedom
Multiple R-squared:  0.9271,    Adjusted R-squared:  0.8803
F-statistic: 19.79 on 9 and 14 DF,  p-value: 1.761

From the ANOVA table with V as the last model variable we find the p‐value Pr(>F) = 0.002296 < 0.05 and we reject the null hypothesis H_A0: ‘all α_i are equal’.

Hence the mean yield y adjusted for stand x differs between the varieties.

Problem 9.16

Estimate the adjusted variety means at the overall mean of x.

Solution

Look at the outcomes of summary(beta0) in Problem 9.14; from this we find the estimate of β as x 8.0578. Apply then formula (9.12).

Example

We continue with the R program of Problems 9.14 and 9.15.

We calculate now first the means of y and x per variety and the overall mean of y and x.

> tapply(y, V, mean)

      1      2      3      4      5      6
173.00 182.25 194.50 232.75 201.00 215.00
> mean.yV1 <- 173.00
> mean.yV2 <- 182.25
> mean.yV3 <- 194.50
> mean.yV4 <- 232.75
> mean.yV5 <- 201.00
> mean.yV6 <- 215.00
> tapply (x, V, mean)
    1     2     3     4     5     6
24.00 25.25 26.50 28.00 27.75 26.50
> mean.x1 <- 24.00
> mean.x2 <- 25.25
> mean.x3 <- 26.50
> mean.x4 <- 28.00
> mean.x5 <- 27.75
> mean.x6 <- 26.50
> mean.y <- mean(y)
> mean.y
[1] 199.75
> mean.x <- mean(x)
> mean.x
[1] 26.33333
> estimate.beta <- 8.0578
> adj.V1 <- mean.yV1-estimate.beta*(mean.x1-mean.x)
> adj.V1
[1] 191.8015
> adj.V2 <- mean.yV2-estimate.beta*(mean.x2-mean.x)
> adj.V2
[1] 190.9793
> adj.V3 <- mean.yV3-estimate.beta*(mean.x3-mean.x)
> adj.V3
[1] 193.157
> adj.V4 <- mean.yV4-estimate.beta*(mean.x4-mean.x)
> adj.V4
[1] 219.3203
> adj.V5 <- mean.yV5-estimate.beta*(mean.x5-mean.x)
> adj.V5
[1] 189.5848
> adj.V6 <- mean.yV6-estimate.beta*(mean.x6-mean.x)
> adj.V6
[1] 213.657

The adjusted means at the overall mean of x are for: V₁ 191.8015; V₂ 190.9793; V₃ 193.157; V₄ 219.3203; V₅ 189.5848; V₆ 213.657.

Problem 9.17

Give the difference of the adjusted means at the overall mean of x of V₁ – V₂.

Give for the expected difference of the adjusted means at the overall mean of x of V₁ – V₂ the (0.95)‐confidence limits.

Solution

Look at the outcomes of summary(beta0) in Problem 9.14; from this we find in the summary(testbeta0) the estimate of β as x 8.0578 and the estimate of its standard error as standard error = 0.9242.

Look further at the outcomes of summary(V0) in Problem 9.15. From this we find estimate V2−0.8223 with standard error 7.0677.

From Problem 9.16 we can calculate the difference of the adjusted means of V₂ – V₁ is 190.9793–191.8015 = −0.8222 which is given in summary(V0) with the estimate of its standard error 7.0677. Hence the difference of V₁ – V₂ is 0.8222 also with the estimate of its standard error 7.0677. We will control this using (9.14).

The mean square error and its df is given in Problem 9.14 in anova(testbeta0) as Residuals 141 361.3 97.2 hence MSE = 97.2 with df = 14. Of course this result is also given in Problem 9.15 in anova(V0). But to have more decimals it is better to use MSE = 1361.3/14.

 > x1 <- c(28,23,27,24,30,30)
> x2 <- c(22,26,24,28,26,25)
> x3 <- c(27,28,27,30,26,27)
> x4 <- c(19,24,28,30,29,24)
> x <- c(x1,x2,x3,x4)
> v1 <- c(1,2,3,4,5,6)
> v <- c(v1,v1,v1,v1)
> tableVX <- data.frame(v,x)
> vx1 <- tableVX[v==1,]
> vx2 <- tableVX[v==2,]
> vx3 <- tableVX[v==3,]
> vx4 <- tableVX[v==4,]
> vx5 <- tableVX[v==5,]
> vx6 <- tableVX[v==6,]
> n <- length(vx1$x)
> svx1 <- sum(vx1$x)
> svx2 <- sum(vx2$x)
> svx3 <- sum(vx3$x)
> svx4 <- sum(vx4$x)
> svx5 <- sum(vx5$x)
> svx5 <- sum(vx5$x)
> svx6 <- sum(vx6$x)
> df <- 14
> MSE <- 1361.3/df
> MSE
[1] 97.23571
> tvalue <- qt(0.975, df)
> tvalue
[1] 2.144787
> A <- sum(x^2)
> B <- (svx1^2+svx2^2+svx3^2+svx4^2+svx5^2+svx6^2)/n
> Exx <- A-B
> Exx
[1] 135.5
> mean.vx1 <- mean(vx1$x)
> mean.vx2 <- mean(vx2$x)
> SEV1V2 <- sqrt(MSE*(2/n +(mean.vx1-mean.vx2)^2/Exx))
> SEV1V2
[1] 7.052597
> diff.V1V2 <- 0.8222
> CLlowerV1V2 <- diff.V1V2 - SEV1V2*tvalue
> CLlowerV1V2
[1] -14.30412 
> CLupperV1V2 <- diff.V1V2 + SEV1V2*tvalue
> CLupperV1V2
[1] 15.94852

The 0.95‐confidence interval for the expected yield of the adjusted means at the overall mean of x of V₁ –V₂ is: [−14.30412; 15.94852].

The estimate of the standard error of the difference of the adjusted means at the overall mean of x of V₁ –V₂ is given by anova(testb0) as 7.0677 and the calculated standard error 7.052597 is due to the output of anova(testb0) not showing enough decimals for the SS(Residuals). If we use the estimate of the standard error of the difference of the adjusted means at the overall mean of x of V₁ –V₂ as 7.0677 then the 0.95‐confidence limits are:

 > SEV1V2 <- 7.0677
> CLlowerV1V2 <- diff.V1V2 - SEV1V2*tvalue
> CLlowerV1V2
[1] -14.33651
> CLupperV1V2 <- diff.V1V2 + SEV1V2*tvalue
> CLupperV1V2
[1] 15.98091

Now the 0.95‐confidence interval for the expected yield of V₁ –V₂ is: [−14.333651; 15.98091].

9.4 Concluding Remarks

If the covariate x as well as the primary response variable y is affected by the treatments the resultant response is multivariate and the covariance adjustments for treatments means is inappropriate. In these cases an analysis of the bivariate response (x, y) utilising multivariate methods is in order. Multivariate methods are not covered in this book.

Adjustment for the covariate is appropriate if it is measured prior to treatment administration since the treatments have not yet had the opportunity to affect its value. If the covariate is measured concurrently with the response variable, then it must be decided whether it could be affected by the treatments before the covariance adjustments are considered. See the rationale given in Example 9.3.

Practical application of the analysis of covariance has been demonstrated only with completely randomised designs and randomised complete block designs; however, the use of covariates can be extended to any treatment and experiment design as well as to comparative observational studies of complex structure and studies requiring the use of multiple covariates for adjustment. Using R, the analysis of multiple covariates is easy to do. Add the multiple covariates in the model equation.

For further information about topics in analysis of covariance, see Snedecor and Cochran (1989) sections 18.5–18.9.

Extensive discussions on the use and misuses of covariates in research studies were provided in two special issues of Biometrics (1957), Volume 13, No. 3; and Biometrics (1982), Volume 38, No. 3. Of particular interest are articles by Cochran (1957), Smith (1957), and Cox and McCullagh (1982). A number of issues arise relevant to the use of covariates. Amongst those concerns are the applicability in certain situations and the relationship between blocking and covariates.

Analysis of covariance for general random and mixed‐effects models is considerably more difficult. Henderson and Henderson (1979) and Henderson (1982) discuss the problem and possible approaches.

References

Cochran, W.G. (1957). Analysis of covariance: its nature and uses. Biometrics 13: 261–281.
Cox, D.R. and McCullagh, P. (1982). Some aspects of analysis of covariance. Biometrics 38: 54–561.
Fisher, R.A. (1935). Statistical Methods for Research Workers, 5^the. Edinburgh: Oliver & Boyd.
Henderson, C.R. Jr. (1982). Analysis of covariance in the mixed model: higher‐level, nonhomogeneous, and random regressions. Biometrics 38: 623–640.
Henderson, C.R. Jr. and Henderson, C.R. (1979). Analysis of covariance in mixed models with unequal subclass numbers. Communications in Statistics A 8: 751–788.
Montgomery, D.C. (2013). Design and Analysis of Experiments, 8^the. New York: John Wiley & Sons, Inc.
Smith, H.F. (1957). Interpretation of adjusted treatment means and regressions in analysis of covariance. Biometrics 13: 282–308.
Snedecor, G.W. and Cochran, W.G. (1989). Statistical Methods, 8^the. Ames: Iowa State University Press.
Walker, G.A. (1997). Common Statistical Methods for Clinical Research, with SAS Examples. Cary, NC: SAS Institute Inc.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9 Analysis of Covariance (ANCOVA)

Create new playlist

Sign In

Sign Up

9.1 Introduction

9.2 Completely Randomised Design with Covariate

9.2.1 Balanced Completely Randomised Design

9.2.2 Unbalanced Completely Randomised Design

9.3 Randomised Complete Block Design with Covariate

9.4 Concluding Remarks

References

Table of Contents for
9 Analysis of Covariance (ANCOVA)