Chapter 17

Tests in variance analysis

Analysis of variance (ANOVA) in its simplest form analyzes if the mean of a Gaussian random variable differs in a number of groups. Often the factor which determines each group is given by applying different treatments to subjects, for example, in designed experiments in technical applications or in clinical studies. The problem can thereby be seen as comparing group means, which extends the t-test to more than two groups. The underlying statistical model may also be presented as a special case of a linear model. In Section 17.1 we handle the one- and two-way cases of ANOVA. The two-way case extends the treated problem to groups characterized by two factors. In this case it is also of interest if the two factors influence each other in their effect on the measured variable, and hence show an interaction effect. One of the crucial assumptions of an ANOVA is the homogeneity of variance within all groups. Section 17.2 deals with tests to check this assumption.

17.1 Analysis of variance

17.1.1 One-way ANOVA

Description: Tests if the mean of a Gaussian random variable is the same in c17-math-0001 groups.
Assumptions:
  • Let c17-math-0002, c17-math-0003, be c17-math-0004 independent samples of independent Gaussian random variables with the same variance but possibly different group means.
  • The sample sizes of the c17-math-0005 samples are c17-math-0006 with c17-math-0007.
  • The random variables c17-math-0008 can be modeled as c17-math-0009 with c17-math-0010, c17-math-0011.
Hypotheses: c17-math-0012 vs c17-math-0013 for at least one c17-math-0014.
Test statistic:
c17-math-0015
c17-math-0016
Test decision: Reject c17-math-0017 if for the observed value c17-math-0018 of c17-math-0019
c17-math-0020
p-values: c17-math-0021
Annotations:
  • The test statistic c17-math-0022 is c17-math-0023-distributed (Rencher 1998, chapter 4).
  • c17-math-0024 is the c17-math-0025-quantile of the F-distribution with c17-math-0026 and c17-math-0027 degrees of freedom.
  • The numerator of the test statistic is also called MST (mean sum of squares for treatment) and the denominator MSE (mean sum of squares of errors).
  • Note that we have presented the one-way model and test for the more general case of an unbalanced design where the sample sizes in the different groups may vary. A balanced design is characterized by an equal number of observations in each group.

Example
To test if the means of the harvest in kilograms of tomatoes in three different greenhouses differ. The dataset contains observations from five fields in each greenhouse (dataset in Table A.12).


SAS code
proc anova data = crop;
 class house
 model kg = house;
run;
quit;
SAS output
Source    DF     Anova SS   Mean Square  F Value   Pr> F
house      2   0.16329333    0.08164667     0.33   0.7262
Remarks:
  • The SAS procedure proc anova is the standard procedure for the analysis of variance with a balanced design as given in this example. For an unbalanced design the procedure proc glm should be used (see below).
  • By using the class statement, SAS treats the variable house as a categorical variable.
  • The code model dependent variable= independent variable defines the model.
  • The quit; statement is used to terminate the procedure; proc anova is an interactive procedure and SAS then knows not to expect any further input.
  • The program code for proc glm is similar:
    proc glm data = crop;
     class house
     model kg = house;
    run;
    quit;


R code
summary(aov(crop$kg∼factor(crop$house)))
R output
                   Df Sum Sq Mean Sq F value Pr(>F)
factor(crop$house)  2 0.1633 0.08165   0.329  0.726
Residuals          12 2.9815 0.24846
Remarks:
  • The function aov() performs an analysis of variance in R. The response variable is placed on the left-hand side of the c17-math-0028 symbol and the independent variables which define the groups on the right-hand side.
  • We use the R function factor() to tell R that house is a categorical variable.
  • The summary function gets R to return the sum of squares, degrees of freedom, p-values, etc.

17.1.2 Two-way ANOVA

Description: Tests if the mean of a Gaussian random variable is the same in groups defined by two factors of interest.
Assumptions:
  • Let c17-math-0029, c17-math-0030, c17-math-0031, c17-math-0032 describe a sample of size c17-math-0033 of independent Gaussian random variables.
  • In each of the c17-math-0034 groups defined by the two factors, we have an equal number of c17-math-0035 observations (balanced design).
  • Each of the variables c17-math-0036 can be modeled asc17-math-0037 with c17-math-0038, where c17-math-0039 is the overall mean and c17-math-0040 and c17-math-0041 are the deviations from it for the first and the second factor and c17-math-0042 describes the interaction between them.
Hypotheses: (A) c17-math-0043
vs c17-math-0044 for at least one pair c17-math-0045
(B) c17-math-0046
vs c17-math-0047 for at least one c17-math-0048
(C) c17-math-0049
vs c17-math-0050 for at least one c17-math-0051
Test statistic:
(A) c17-math-0052
(B) c17-math-0053
(C) c17-math-0054
with
c17-math-0055
c17-math-0056
Test decision: Reject c17-math-0057 if for the observed value c17-math-0058 of c17-math-0059, c17-math-0060 or c17-math-0061
(A) c17-math-0062
(B) c17-math-0063
(C) c17-math-0064
p-values: c17-math-0065
Annotations:
  • The test statistic c17-math-0066 is F-distributed with c17-math-0067 (A), c17-math-0068 (B) or c17-math-0069 degrees of freedom for the nominator and c17-math-0070 degrees of freedom for the denominator (Montgomery and Runger 2007, chapter 14).
  • c17-math-0071 is the c17-math-0072-quantile of the F-distribution with c17-math-0073 and c17-math-0074 degrees of freedom.
  • Hypothesis (A) tests if there is an interaction between the two factors. Hypotheses (A) and (B) are testing the main effects of the two factors.

Example
To test if the means of the harvest in kilograms of tomatoes in three different greenhouses and using five different fertilizers differ. The dataset contains observations from five fields with each fertilizer in each greenhouse (dataset in Table A.12).


SAS code
proc anova data= crop;
      class house fertilizer;
      model kg =  house fertilizer;
run;
quit;
SAS output
                    The ANOVA Procedure
Dependent Variable: kg
Source       DF    Anova SS  Mean Square  F Value  Pr> F
house         2  0.16329333   0.08164667     0.50  0.6268
fertilizer    4  1.66337333   0.41584333     2.52  0.1235
Remarks:
  • The SAS procedure proc anova is the standard procedure for an ANOVA with a balanced design. For an unbalanced design the procedure proc glm should be used.
  • By using the class statement, SAS treats the variables house and fertilizer as categorical variables.
  • The code model dependent variable= independent variables defines the model. To incorporate an interaction term a star is used, for example, variable1variable2.
  • The quit; statement is used to terminate the procedure; proc anova is an interactive procedure and SAS then knows not to expect any further input.
  • The program code for proc glm is similar:
    proc glm data = crop;
     class house fertilizer
     model kg = house fertilizer;
    run;
    quit;


R code
kg<-crop$kg
field<-crop$house
fertilizer<-crop$fertilizer
summary(aov(kgfactor(field)+factor(fertilizer)))
R output
                   Df Sum Sq Mean Sq F value Pr(>F)
factor(house)       2 0.1633  0.0816   0.496  0.627
factor(fertilizer)  4 1.6634  0.4158   2.524  0.123
Residuals           8 1.3181  0.1648
Remarks:
  • The function aov() performs an ANOVA in R. The response variable is placed on the left-hand side of the c17-math-0075 symbol and the independent variables which define the groups on the right-hand side separated by a plus (+). To incorporate an interaction term a star is used, for example, variable1variable2.
  • We use the R function factor() to tell R that house is a categorical variable.
  • The summary function gets R to return the sum of squares, degrees of freedom, p-values, etc.

17.2 Tests for homogeneity of variances

17.2.1 Bartlett test

Description: Tests if the variances of c17-math-0076 Gaussian populations differ from each other.
Assumptions:
  • Data are measured on an interval or ratio scale.
  • Data are randomly sampled from c17-math-0077 independent Gaussian distributions.
  • The c17-math-0078 random variables c17-math-0079 from where the samples are drawn have variances c17-math-0080.
  • Further c17-math-0081 is the c17-math-0082 sample with c17-math-0083 observations, c17-math-0084.
Hypotheses: c17-math-0085 vs c17-math-0086 for at least one c17-math-0087
Test statistic:
c17-math-0088
with c17-math-0089, c17-math-0090
and c17-math-0091
Test decision: Reject c17-math-0092 if for the observed value c17-math-0093 of c17-math-0094
c17-math-0095
p-values: c17-math-0096
Annotations:
  • The test statistic c17-math-0097 is c17-math-0098-distributed (Glaser 1976).
  • c17-math-0099 is the c17-math-0100-quantile of the c17-math-0101-distribution with c17-math-0102 degrees of freedom.
  • This test was introduced by Maurice Bartlett (1937).
  • The Bartlett test is very sensitive to the violation of the Gaussian assumption. If the samples are not Gaussian distributed an alternative is Levene's test (Test 17.2.2).

Example
To test if the variances of the harvest in kilograms of tomatoes in three different greenhouses are the same (dataset in Table A.12).


SAS code
proc glm data = crop;
 class house;
 model kg = house;
 means house /hovtest=BARTLETT ;
run;
quit;
SAS output
                The GLM Procedure
 Bartlett's Test for Homogeneity of kg Variance
  Source        DF    Chi-Square    Pr > ChiSq
  house          2        2.1346        0.3439
Remarks:
  • The SAS procedure proc glm provides the Bartlett test.
  • The first lines of code are enabling an ANOVA (see Test 16.2.1).
  • The code means house /hovtest=BARTLETT lets SAS conduct the Bartlett test.


R code
bartlett.test(crop$kg∼crop$house)
R output
 Bartlett test of homogeneity of variances
data:  crop$kg by crop$field
Bartlett's K-squared = 2.1346, df = 2, p-value = 0.3439
Remarks:
  • The function bartlett.test() conducts the Bartlett test.
  • The analysis variable is coded on the left-hand side of the c17-math-0103 and the group variable on the right-hand side.

17.2.2 Levene test

Description: Tests if the variances of k populations differ from each other.
Assumptions:
  • Data are measured on an interval or ratio scale.
  • Data are randomly sampled from c17-math-0104 independent random variables c17-math-0105 with variances c17-math-0106.
  • Further c17-math-0107 is the c17-math-0108 sample with c17-math-0109 observations, c17-math-0110.
Hypotheses: c17-math-0111 vs c17-math-0112 for at least one c17-math-0113.
Test statistic:
c17-math-0114
c17-math-0115
Test decision: Reject c17-math-0116 if for the observed value c17-math-0117 of c17-math-0118
c17-math-0119
p-values: c17-math-0120
Annotations:
  • The test statistic c17-math-0121 is c17-math-0122-distributed.
  • c17-math-0123 is the c17-math-0124-quantile of the F-distribution with c17-math-0125 and c17-math-0126 degrees of freedom.
  • This test was introduced by Howard Levene 1960. In 1974 Morton Brown and Alan Forsythe proposed the use of the median or trimmed mean instead of the mean for calculating the c17-math-0127 (Brown and Forsythe 1974). This test is called the Brown–Forsythe test.
  • This test does not need the assumption of underlying Gaussian distributions and should be used if that assumption is doubtful. If the data are Gaussian distributed Bartlett's test can be used (see Test 17.2.1).

Example
To test if the variances of the harvest in kilograms of tomatoes in three different greenhouses are the same (dataset in Table A.12).


SAS code
proc glm data = crop;
 class house;
 model kg = house;
 means house /hovtest=levene (TYPE=ABS) ;
run;
quit;
SAS output
      Levene's Test for Homogeneity of kg Variance
      ANOVA of Absolute Deviations from Group Means
                  Sum of        Mean
Source     DF     Squares      Square    F Value    Pr> F
house       2      0.2675      0.1337       2.79    0.1012
Error      12      0.5753      0.0479
Remarks:
  • The SAS procedure proc glm provides the Levene test.
  • The first lines of code are enabling an ANOVA (see Test 16.2.1).
  • The code means house /hovtest=levene (TYPE=ABS) lets SAS do the Levene test. In SAS it is also possible to choose the option (TYPE=SQUARE) which uses the squared differences.
  • The Brown–Forsythe test can be conducted with the option /hovtest=BF.


R code
# Calculate group means for each field
m<-tapply(crop$kg,crop$house,mean)
# Calculate the Z's
z<-abs(crop$kg-m[crop$house])
# Overall mean of the Z's
z_mean=mean(z)
# Group mean of the Z's
z_gm<-tapply(z,crop$house,mean)
# Make a matrix of the Z's (group in the rows)
z_matrix<-rbind(z[crop$house==1],z[crop$house==2],
                z[crop$house==3])
# Calculate the numerator
nu<-0
for (i in 1:3)
 {
  u<-5*(z_gm[i]-z_mean)∧2
  nu<-nu+u
 }
# Calculate the denominator
de<-0
for (j in 1:3)
{
 for (i in 1:5)
  {
   e<-(z_matrix[j,i]-z_gm[j])∧2
   de<-de+e
  }
}
# Calculate test statistic and p-value
l<-(12*nu)/(2*de)
p_value<-1-pf(l,2,12)
# Output results
“Levene Test”
l
p_value
R output
[1] “Levene Test”
> l
       1
2.789499
> p_value
        1
0.1011865
>
Remarks:
  • There is no basic R function to calculate Levene's test directly.
  • In this example we have c17-math-0128 and c17-math-0129. The respective parts must be adopted if other data are used.
  • To apply the Brown–Forsythe test just change the first line of code to m<-tapply(crop$kg,crop$house,median).

References

Bartlett M.S. Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society Series A 160, 268–282.

Brown M.B. and Forsythe A.B. 1974 Robust tests for the equality of variances. Journal of the American Statistical Association 69, 364–367.

Glaser R.E. 1976 Exact critical values for Bartletts test for homogeneity of variances. Journal of the American Statistical Association 71, 488–490.

Levene H. 1960. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (eds Olkin I et al.), pp. 278–292. Stanford University Press.

Montgomery D.C. and Runger G.C. 2007 Applied Statistics and Probability for Engineers, 4th edn. John Wiley & Sons, Ltd.

Rencher A.C. 1998 Multivariate Statistical Inference and Applications. John Wiley & Sons, Ltd.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.134.130