Testing more than two means

Another really common situation requires testing whether three or more means are significantly discrepant. We would find ourselves in this situation if we had three experimental conditions in the blood pressure trial: one groups gets a placebo, one group gets a low dose of the real medication, and one groups gets a high dose of the real medication.

Hmm, for cases like these, why don't we just do a series of t-tests? For example, we can test the directional alternative hypotheses:

  • The low dose of blood pressure medication lowers BP significantly more than the placebo
  • The high dose of blood pressure medication lowers BP significantly more than the low dose

Well, it turns out that doing this first is pretty dangerous business, and the logic goes like this: if our alpha level is 0.05, then the chances of making a Type I error for one test is 0.05; if we perform two tests, then our chances of making a Type I error is suddenly .09025 (near 10%). By the time we perform 10 tests at that alpha level, the chances of us having making a Type I error is 40%. This is called the multiple testing problem or multiple comparisons problem.

To circumvent this problem, in the case of testing three or more means, we use a technique called Analysis of Variance, or ANOVA. A significant result from an ANOVA leads to the inference that at least one of the means is significantly discrepant from one of the other means; it does not lend itself to the inference that all the means are significantly different. This is an example of an omnibus test, because it is a global test that doesn't tell you exactly where the differences are, just that there are differences.

You might be wondering why a test of equality of means has a name called Analysis of Variance; it's because it does this by comparing the variance between cases to the variance within cases. The general intuition behind an ANOVA is that the higher the ratio of variance between the different groups than within the different groups, the less likely that the different groups were sampled from the same population. This ratio is called an F ratio.

For our demonstration of the simplest species of ANOVA (the one-way ANOVA), we are going to be using the WeightLoss dataset from the car package. If you don't have the car package, install it.

  > library(car)
  > head(WeightLoss)
      group wl1 wl2 wl3 se1 se2 se3
  1 Control   4   3   3  14  13  15
  2 Control   4   4   3  13  14  17
  3 Control   4   3   1  17  12  16
  4 Control   3   2   1  11  11  12
  5 Control   5   3   2  16  15  14
  6 Control   6   5   4  17  18  18
  >
  > table(WeightLoss$group)
  
  Control    Diet  DietEx
       12      12      10

The WeightLoss dataset contains pounds lost and self esteem measurements for three weeks for three different groups: a control group, one group just on a diet, and one group that dieted and exercised. We will be testing the hypothesis that the means of the weight loss at week 2 are not all equal:

  • H0 = the mean weight loss at week 2 between the control, diet group, and diet and exercise group are equal
  • H1 = at least two of the means of weight loss at week 2 between the control, diet group, and diet and exercise group are not equal

Before the test, let's check out a box plot of the means:

  > qplot(group, wl2, data=WeightLoss, geom="boxplot", fill=group)
Testing more than two means

Figure 6.8: Boxplot of weight lost in week 2 of trial for three groups: control, diet, and diet & exercise

Now for the ANOVA…

  > the.anova <- aov(wl2 ~ group, data=WeightLoss)
  > summary(the.anova)
              Df Sum Sq Mean Sq F value   Pr(>F)
  group        2  45.28  22.641   13.37 6.49e-05 ***
  Residuals   31  52.48   1.693
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Oh, snap! The p-value (Pr(>F)) is 6.49e-05, which is .000065 if you haven't read scientific notation yet.

As I said before, this just means that at least one of the comparisons between means was significant—there are four ways that this could occur:

  • The means of diet and diet and exercise are different
  • The means of diet and control are different
  • The means of control and diet and exercise are different
  • The means of control, diet, and diet and exercise are all different

In order to investigate further, we perform a post-hoc test. Quite often, the post-hoc test that analysts perform is a suite of t-tests comparing each pair of means (pairwise t-tests).

But wait, didn't I say that was dangerous business? I did, but it's different now:

  • We have already performed an honest-to-goodness omnibus test at the alpha level of our choosing. Only after we achieve significance do we perform pairwise t-tests.
  • We correct for the problem of multiple comparisons

The easiest multiple comparison correcting procedure to understand is Bonferroni correction. In its simplest version, it simply changes the alpha value by dividing it by the number of tests being performed. It is considered the most conservative of all the multiple comparison correction methods. In fact, many consider it too conservative and I'm inclined to agree. Instead, I suggest using a correcting procedure called Holm-Bonferroni correction. R uses this by default.

  > pairwise.t.test(WeightLoss$wl2, as.vector(WeightLoss$group))
  
          Pairwise comparisons using t tests with pooled SD
  
  data:  WeightLoss$wl2 and as.vector(WeightLoss$group)
  
         Control Diet
  Diet   0.28059 -
  DietEx 7.1e-05 0.00091
  
  P value adjustment method: holm

This output indicates that the difference in means between the Diet and Diet and exercise groups is p < .001. Additionally, it indicates that the difference between Diet and exercise and Control is p < .0001 (look at the cell where it says 7.1e-05). The p-value of the comparison of just diet and the control is .28, so we fail to reject the hypothesis that they have the same mean.

Assumptions of ANOVA

The standard one-way ANOVA makes three main assumptions:

  • The observations are independent
  • The distribution of the residuals (the distances between the values within the groups to their respective means) is approximately normal
  • Homogeneity of variance: If you suspect that this assumption is violated, you can use R's oneway.test instead
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.145.11