Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 17 Tests in variance analysis

Analysis of variance (ANOVA) in its simplest form analyzes if the mean of a Gaussian random variable differs in a number of groups. Often the factor which determines each group is given by applying different treatments to subjects, for example, in designed experiments in technical applications or in clinical studies. The problem can thereby be seen as comparing group means, which extends the t-test to more than two groups. The underlying statistical model may also be presented as a special case of a linear model. In Section 17.1 we handle the one- and two-way cases of ANOVA. The two-way case extends the treated problem to groups characterized by two factors. In this case it is also of interest if the two factors influence each other in their effect on the measured variable, and hence show an interaction effect. One of the crucial assumptions of an ANOVA is the homogeneity of variance within all groups. Section 17.2 deals with tests to check this assumption.

17.1 Analysis of variance

17.1.1 One-way ANOVA

Description:	Tests if the mean of a Gaussian random variable is the same in $c17-math-0001$ groups.
Assumptions:	Let $c17-math-0002$ , $c17-math-0003$ , be $c17-math-0004$ independent samples of independent Gaussian random variables with the same variance but possibly different group means. The sample sizes of the $c17-math-0005$ samples are $c17-math-0006$ with $c17-math-0007$ . The random variables $c17-math-0008$ can be modeled as $c17-math-0009$ with $c17-math-0010$ , $c17-math-0011$ .
Hypotheses:	$c17-math-0012$ vs $c17-math-0013$ for at least one $c17-math-0014$ .

Test statistic:
	$c17-math-0015$
	$c17-math-0016$
Test decision:	Reject $c17-math-0017$ if for the observed value $c17-math-0018$ of $c17-math-0019$
	$c17-math-0020$
p-values:	$c17-math-0021$
Annotations:	The test statistic $c17-math-0022$ is $c17-math-0023$ -distributed (Rencher 1998, chapter 4). $c17-math-0024$ is the $c17-math-0025$ -quantile of the F-distribution with $c17-math-0026$ and $c17-math-0027$ degrees of freedom. The numerator of the test statistic is also called MST (mean sum of squares for treatment) and the denominator MSE (mean sum of squares of errors). Note that we have presented the one-way model and test for the more general case of an unbalanced design where the sample sizes in the different groups may vary. A balanced design is characterized by an equal number of observations in each group.

Example

To test if the means of the harvest in kilograms of tomatoes in three different greenhouses differ. The dataset contains observations from five fields in each greenhouse (dataset in Table A.12).

SAS code

proc anova data = crop;
 class house
 model kg = house;
run;
quit;

SAS output

Source    DF     Anova SS   Mean Square  F Value   Pr> F
house      2   0.16329333    0.08164667     0.33   0.7262

Remarks:

The SAS procedure proc anova is the standard procedure for the analysis of variance with a balanced design as given in this example. For an unbalanced design the procedure proc glm should be used (see below).
By using the class statement, SAS treats the variable house as a categorical variable.
The code model dependent variable= independent variable defines the model.
The quit; statement is used to terminate the procedure; proc anova is an interactive procedure and SAS then knows not to expect any further input.

The program code for proc glm is similar:

proc glm data = crop;
 class house
 model kg = house;
run;
quit;

R code

summary(aov(crop$kg∼factor(crop$house)))

R output

                   Df Sum Sq Mean Sq F value Pr(>F)
factor(crop$house)  2 0.1633 0.08165   0.329  0.726
Residuals          12 2.9815 0.24846

Remarks:

The function aov() performs an analysis of variance in R. The response variable is placed on the left-hand side of the $c17-math-0028$ symbol and the independent variables which define the groups on the right-hand side.
We use the R function factor() to tell R that house is a categorical variable.
The summary function gets R to return the sum of squares, degrees of freedom, p-values, etc.

17.1.2 Two-way ANOVA

Description:	Tests if the mean of a Gaussian random variable is the same in groups defined by two factors of interest.
Assumptions:	Let $c17-math-0029$ , $c17-math-0030$ , $c17-math-0031$ , $c17-math-0032$ describe a sample of size $c17-math-0033$ of independent Gaussian random variables. In each of the $c17-math-0034$ groups defined by the two factors, we have an equal number of $c17-math-0035$ observations (balanced design).

	Each of the variables $c17-math-0036$ can be modeled as $c17-math-0037$ with $c17-math-0038$ , where $c17-math-0039$ is the overall mean and $c17-math-0040$ and $c17-math-0041$ are the deviations from it for the first and the second factor and $c17-math-0042$ describes the interaction between them.
Hypotheses:	(A) $c17-math-0043$
	vs $c17-math-0044$ for at least one pair $c17-math-0045$
	(B) $c17-math-0046$
	vs $c17-math-0047$ for at least one $c17-math-0048$
	(C) $c17-math-0049$
	vs $c17-math-0050$ for at least one $c17-math-0051$
Test statistic:
	(A) $c17-math-0052$
	(B) $c17-math-0053$
	(C) $c17-math-0054$
	with
	$c17-math-0055$
	$c17-math-0056$
Test decision:	Reject $c17-math-0057$ if for the observed value $c17-math-0058$ of $c17-math-0059$ , $c17-math-0060$ or $c17-math-0061$
	(A) $c17-math-0062$
	(B) $c17-math-0063$
	(C) $c17-math-0064$
p-values:	$c17-math-0065$
Annotations:	The test statistic $c17-math-0066$ is F-distributed with $c17-math-0067$ (A), $c17-math-0068$ (B) or $c17-math-0069$ degrees of freedom for the nominator and $c17-math-0070$ degrees of freedom for the denominator (Montgomery and Runger 2007, chapter 14).

$c17-math-0071$ is the $c17-math-0072$ -quantile of the F-distribution with $c17-math-0073$ and $c17-math-0074$ degrees of freedom.
Hypothesis (A) tests if there is an interaction between the two factors. Hypotheses (A) and (B) are testing the main effects of the two factors.

Example

To test if the means of the harvest in kilograms of tomatoes in three different greenhouses and using five different fertilizers differ. The dataset contains observations from five fields with each fertilizer in each greenhouse (dataset in Table A.12).

SAS code

proc anova data= crop;
      class house fertilizer;
      model kg =  house fertilizer;
run;
quit;

SAS output

                    The ANOVA Procedure
Dependent Variable: kg
Source       DF    Anova SS  Mean Square  F Value  Pr> F
house         2  0.16329333   0.08164667     0.50  0.6268
fertilizer    4  1.66337333   0.41584333     2.52  0.1235

Remarks:

The SAS procedure proc anova is the standard procedure for an ANOVA with a balanced design. For an unbalanced design the procedure proc glm should be used.
By using the class statement, SAS treats the variables house and fertilizer as categorical variables.
The code model dependent variable= independent variables defines the model. To incorporate an interaction term a star is used, for example, variable1variable2.
The quit; statement is used to terminate the procedure; proc anova is an interactive procedure and SAS then knows not to expect any further input.

The program code for proc glm is similar:

proc glm data = crop;
 class house fertilizer
 model kg = house fertilizer;
run;
quit;

R code

kg<-crop$kg
field<-crop$house
fertilizer<-crop$fertilizer
summary(aov(kg∼factor(field)+factor(fertilizer)))

R output

                   Df Sum Sq Mean Sq F value Pr(>F)
factor(house)       2 0.1633  0.0816   0.496  0.627
factor(fertilizer)  4 1.6634  0.4158   2.524  0.123
Residuals           8 1.3181  0.1648

Remarks:

The function aov() performs an ANOVA in R. The response variable is placed on the left-hand side of the $c17-math-0075$ symbol and the independent variables which define the groups on the right-hand side separated by a plus (+). To incorporate an interaction term a star is used, for example, variable1variable2.
We use the R function factor() to tell R that house is a categorical variable.
The summary function gets R to return the sum of squares, degrees of freedom, p-values, etc.

17.2 Tests for homogeneity of variances

17.2.1 Bartlett test

Description:	Tests if the variances of $c17-math-0076$ Gaussian populations differ from each other.
Assumptions:	Data are measured on an interval or ratio scale. Data are randomly sampled from $c17-math-0077$ independent Gaussian distributions. The $c17-math-0078$ random variables $c17-math-0079$ from where the samples are drawn have variances $c17-math-0080$ . Further $c17-math-0081$ is the $c17-math-0082$ sample with $c17-math-0083$ observations, $c17-math-0084$ .
Hypotheses:	$c17-math-0085$ vs $c17-math-0086$ for at least one $c17-math-0087$

Test statistic:
	$c17-math-0088$
	with $c17-math-0089$ , $c17-math-0090$
	and $c17-math-0091$
Test decision:	Reject $c17-math-0092$ if for the observed value $c17-math-0093$ of $c17-math-0094$
	$c17-math-0095$
p-values:	$c17-math-0096$
Annotations:	The test statistic $c17-math-0097$ is $c17-math-0098$ -distributed (Glaser 1976). $c17-math-0099$ is the $c17-math-0100$ -quantile of the $c17-math-0101$ -distribution with $c17-math-0102$ degrees of freedom. This test was introduced by Maurice Bartlett (1937). The Bartlett test is very sensitive to the violation of the Gaussian assumption. If the samples are not Gaussian distributed an alternative is Levene's test (Test 17.2.2).

Example

To test if the variances of the harvest in kilograms of tomatoes in three different greenhouses are the same (dataset in Table A.12).

SAS code

proc glm data = crop;
 class house;
 model kg = house;
 means house /hovtest=BARTLETT ;
run;
quit;

SAS output

                The GLM Procedure
 Bartlett's Test for Homogeneity of kg Variance
  Source        DF    Chi-Square    Pr > ChiSq
  house          2        2.1346        0.3439

Remarks:

The SAS procedure proc glm provides the Bartlett test.
The first lines of code are enabling an ANOVA (see Test 16.2.1).
The code means house /hovtest=BARTLETT lets SAS conduct the Bartlett test.

R code

bartlett.test(crop$kg∼crop$house)

R output

 Bartlett test of homogeneity of variances
data:  crop$kg by crop$field
Bartlett's K-squared = 2.1346, df = 2, p-value = 0.3439

Remarks:

The function bartlett.test() conducts the Bartlett test.
The analysis variable is coded on the left-hand side of the $c17-math-0103$ and the group variable on the right-hand side.

17.2.2 Levene test

Description:	Tests if the variances of k populations differ from each other.
Assumptions:	Data are measured on an interval or ratio scale. Data are randomly sampled from $c17-math-0104$ independent random variables $c17-math-0105$ with variances $c17-math-0106$ . Further $c17-math-0107$ is the $c17-math-0108$ sample with $c17-math-0109$ observations, $c17-math-0110$ .
Hypotheses:	$c17-math-0111$ vs $c17-math-0112$ for at least one $c17-math-0113$ .
Test statistic:
	$c17-math-0114$
	$c17-math-0115$

Test decision:	Reject $c17-math-0116$ if for the observed value $c17-math-0117$ of $c17-math-0118$
	$c17-math-0119$
p-values:	$c17-math-0120$
Annotations:	The test statistic $c17-math-0121$ is $c17-math-0122$ -distributed. $c17-math-0123$ is the $c17-math-0124$ -quantile of the F-distribution with $c17-math-0125$ and $c17-math-0126$ degrees of freedom. This test was introduced by Howard Levene 1960. In 1974 Morton Brown and Alan Forsythe proposed the use of the median or trimmed mean instead of the mean for calculating the $c17-math-0127$ (Brown and Forsythe 1974). This test is called the Brown–Forsythe test. This test does not need the assumption of underlying Gaussian distributions and should be used if that assumption is doubtful. If the data are Gaussian distributed Bartlett's test can be used (see Test 17.2.1).

Example

To test if the variances of the harvest in kilograms of tomatoes in three different greenhouses are the same (dataset in Table A.12).

SAS code

proc glm data = crop;
 class house;
 model kg = house;
 means house /hovtest=levene (TYPE=ABS) ;
run;
quit;

SAS output

      Levene's Test for Homogeneity of kg Variance
      ANOVA of Absolute Deviations from Group Means
                  Sum of        Mean
Source     DF     Squares      Square    F Value    Pr> F
house       2      0.2675      0.1337       2.79    0.1012
Error      12      0.5753      0.0479

Remarks:

The SAS procedure proc glm provides the Levene test.
The first lines of code are enabling an ANOVA (see Test 16.2.1).
The code means house /hovtest=levene (TYPE=ABS) lets SAS do the Levene test. In SAS it is also possible to choose the option (TYPE=SQUARE) which uses the squared differences.
The Brown–Forsythe test can be conducted with the option /hovtest=BF.

R code

# Calculate group means for each field
m<-tapply(crop$kg,crop$house,mean)
# Calculate the Z's
z<-abs(crop$kg-m[crop$house])
# Overall mean of the Z's
z_mean=mean(z)
# Group mean of the Z's
z_gm<-tapply(z,crop$house,mean)
# Make a matrix of the Z's (group in the rows)
z_matrix<-rbind(z[crop$house==1],z[crop$house==2],
                z[crop$house==3])
# Calculate the numerator
nu<-0
for (i in 1:3)
 {
  u<-5*(z_gm[i]-z_mean)∧2
  nu<-nu+u
 }
# Calculate the denominator
de<-0
for (j in 1:3)
{
 for (i in 1:5)
  {
   e<-(z_matrix[j,i]-z_gm[j])∧2
   de<-de+e
  }
}
# Calculate test statistic and p-value
l<-(12*nu)/(2*de)
p_value<-1-pf(l,2,12)
# Output results
“Levene Test”
l
p_value

R output

[1] “Levene Test”
> l
       1
2.789499
> p_value
        1
0.1011865
>

Remarks:

There is no basic R function to calculate Levene's test directly.
In this example we have $c17-math-0128$ and $c17-math-0129$ . The respective parts must be adopted if other data are used.
To apply the Brown–Forsythe test just change the first line of code to m<-tapply(crop$kg,crop$house,median).

References

Bartlett M.S. Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society Series A 160, 268–282.

Brown M.B. and Forsythe A.B. 1974 Robust tests for the equality of variances. Journal of the American Statistical Association 69, 364–367.

Glaser R.E. 1976 Exact critical values for Bartletts test for homogeneity of variances. Journal of the American Statistical Association 71, 488–490.

Levene H. 1960. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (eds Olkin I et al.), pp. 278–292. Stanford University Press.

Montgomery D.C. and Runger G.C. 2007 Applied Statistics and Probability for Engineers, 4th edn. John Wiley & Sons, Ltd.

Rencher A.C. 1998 Multivariate Statistical Inference and Applications. John Wiley & Sons, Ltd.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 17: Tests in variance analysis

Create new playlist

Sign In

Sign Up

Chapter 17

Tests in variance analysis

17.1 Analysis of variance

17.1.1 One-way ANOVA

17.1.2 Two-way ANOVA

17.2 Tests for homogeneity of variances

17.2.1 Bartlett test

17.2.2 Levene test

References

Table of Contents for
Chapter 17: Tests in variance analysis