Another really common situation requires testing whether three or more means are significantly discrepant. We would find ourselves in this situation if we had three experimental conditions in the blood pressure trial: one groups gets a placebo, one group gets a low dose of the real medication, and one groups gets a high dose of the real medication.
Hmm, for cases like these, why don't we just do a series of t-tests? For example, we can test the directional alternative hypotheses:
Well, it turns out that doing this first is pretty dangerous business, and the logic goes like this: if our alpha level is 0.05, then the chances of making a Type I error for one test is 0.05; if we perform two tests, then our chances of making a Type I error is suddenly .09025 (near 10%). By the time we perform 10 tests at that alpha level, the chances of us having making a Type I error is 40%. This is called the multiple testing problem or multiple comparisons problem.
To circumvent this problem, in the case of testing three or more means, we use a technique called Analysis of Variance, or ANOVA. A significant result from an ANOVA leads to the inference that at least one of the means is significantly discrepant from one of the other means; it does not lend itself to the inference that all the means are significantly different. This is an example of an omnibus test, because it is a global test that doesn't tell you exactly where the differences are, just that there are differences.
You might be wondering why a test of equality of means has a name called Analysis of Variance; it's because it does this by comparing the variance between cases to the variance within cases. The general intuition behind an ANOVA is that the higher the ratio of variance between the different groups than within the different groups, the less likely that the different groups were sampled from the same population. This ratio is called an F ratio.
For our demonstration of the simplest species of ANOVA (the one-way ANOVA), we are going to be using the WeightLoss
dataset from the car package. If you don't have the car
package, install it.
> library(car) > head(WeightLoss) group wl1 wl2 wl3 se1 se2 se3 1 Control 4 3 3 14 13 15 2 Control 4 4 3 13 14 17 3 Control 4 3 1 17 12 16 4 Control 3 2 1 11 11 12 5 Control 5 3 2 16 15 14 6 Control 6 5 4 17 18 18 > > table(WeightLoss$group) Control Diet DietEx 12 12 10
The WeightLoss
dataset contains pounds lost and self esteem measurements for three weeks for three different groups: a control group, one group just on a diet, and one group that dieted and exercised. We will be testing the hypothesis that the means of the weight loss at week 2 are not all equal:
Before the test, let's check out a box plot of the means:
> qplot(group, wl2, data=WeightLoss, geom="boxplot", fill=group)
Now for the ANOVA…
> the.anova <- aov(wl2 ~ group, data=WeightLoss) > summary(the.anova) Df Sum Sq Mean Sq F value Pr(>F) group 2 45.28 22.641 13.37 6.49e-05 *** Residuals 31 52.48 1.693 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Oh, snap! The p-value (Pr(>F))
is 6.49e-05
, which is .000065 if you haven't read scientific notation yet.
As I said before, this just means that at least one of the comparisons between means was significant—there are four ways that this could occur:
In order to investigate further, we perform a post-hoc test. Quite often, the post-hoc test that analysts perform is a suite of t-tests comparing each pair of means (pairwise t-tests).
But wait, didn't I say that was dangerous business? I did, but it's different now:
The easiest multiple comparison correcting procedure to understand is Bonferroni correction. In its simplest version, it simply changes the alpha value by dividing it by the number of tests being performed. It is considered the most conservative of all the multiple comparison correction methods. In fact, many consider it too conservative and I'm inclined to agree. Instead, I suggest using a correcting procedure called Holm-Bonferroni correction. R uses this by default.
> pairwise.t.test(WeightLoss$wl2, as.vector(WeightLoss$group)) Pairwise comparisons using t tests with pooled SD data: WeightLoss$wl2 and as.vector(WeightLoss$group) Control Diet Diet 0.28059 - DietEx 7.1e-05 0.00091 P value adjustment method: holm
This output indicates that the difference in means between the Diet and Diet and exercise groups is p < .001
. Additionally, it indicates that the difference between Diet and exercise and Control is p < .0001
(look at the cell where it says 7.1e-05
). The p-value of the comparison of just diet and the control is .28, so we fail to reject the hypothesis that they have the same mean.
The standard one-way ANOVA makes three main assumptions:
oneway.test
instead18.117.145.11