Introduction: The Basics of One-Way ANOVA, Between-Subjects Design

One-way analysis of variance (ANOVA) is appropriate when an analysis involves:

  • a single predictor variable that is measured on a nominal scale and can assume two or more values;

  • a single criterion variable that is measured on an interval or ratio scale.

In Chapter 8, “t Tests: Independent Samples and Paired Samples,” you learned about the independent samples t test, which you can use to determine whether there is a significant difference between two groups with regard to their respective scores on an interval- or ratio-scale criterion variable. But what if you are conducting a study in which you must compare more than just two groups? In those situations, it is often appropriate to analyze your data using a one-way ANOVA.

The analysis that this chapter describes is called one-way ANOVA because you use it to analyze data from studies in which there is only one predictor variable (or independent variable). In contrast, Chapter 10, “Factorial ANOVA with Two Between-Subjects Factors,” presents a statistical procedure that is appropriate for studies with two predictor variables.

The Aggression Study

To illustrate a situation for which a one-way ANOVA might be appropriate, imagine that you are conducting research on aggression in children. Assume that a review of prior research has led you to believe that consuming sugar causes children to behave more aggressively. You therefore want to conduct a study to test the following hypothesis:

The amount of sugar consumed by eight-year-old children has a positive effect on the levels of aggression that they subsequently display.

To test your hypothesis, you conduct an investigation in which each child in a group of 60 children is assigned to one of three experimental conditions. Assignments are made in the following way:

  • 20 children are assigned to the “0 grams of sugar” control condition;

  • 20 children are assigned to the “20 grams of sugar” treatment condition;

  • 20 children are assigned to the “40 grams of sugar” treatment condition.

The independent variable in the study is “the amount of sugar consumed.” You manipulate this variable by controlling the amount of sugar that is contained in the lunch that each child receives. In this way, you ensure that the children in the “0 grams of sugar” group are actually consuming 0 grams of sugar, that the children in the “20 grams of sugar” group are actually consuming 20 grams, and so forth.

The dependent variable in the study is “level of aggression” displayed by each child. To measure this variable, a pair of observers watches each child for a set period of time each day after lunch. These observers tabulate the number of aggressive acts performed by each child during this time. The total number of aggressive acts performed over a two-week period serves as each child’s score on the dependent variable.

You can see that the data from this investigation are appropriate for a one-way ANOVA because:

  • the study involves a single predictor variable that is measured on a nominal scale (i.e., “amount of sugar consumed”);

  • the predictor variable assumes more than two values (i.e., the “0-gram,” the “20-gram,” and the “40-gram” groups);

  • the study involves a single criterion variable (number of aggressive acts) that is measured on an interval or ratio scale.

Between-Subjects Designs versus Repeated-Measures Designs

The research design that this chapter discusses is referred to as a between-subjects design because each participant appears in only one group, and comparisons are made between different groups of participants. For example, in the experiment just described, a given participant is assigned to just one treatment condition (e.g., the 20-gram group), and provides data on the dependent variable for only that specific experimental condition.

A distinction, therefore, is made between a between-subjects design and a repeated-measures design. With a repeated-measures design, a given participant provides data under each treatment condition in the study. (It is called a “repeated-measures” design because each participant provides repeated measurements on the dependent variable.)

In short, a one-way ANOVA with one between-subjects factor is directly comparable to the independent-samples t test from Chapter 8. The main difference is that you can use a t test to compare just two groups, while you can use a one-way ANOVA to compare two or more groups. In the same way, a one-way ANOVA with one repeated-measures factor is very similar to the paired-samples t test from Chapter 8. Again, the main difference is that you can use a t test to analyze data from just two treatment conditions, but you can use a repeated-measures ANOVA with data from two or more treatment conditions. (The repeated-measures ANOVA is covered in Chapter 12, “One-Way ANOVA with One Repeated-Measures Factor.”)

Multiple Comparison Procedures

When you analyze data from an experiment with a between-subjects ANOVA, you can state the null hypothesis as follows:

In the population, there is no difference between the various conditions with respect to their mean scores on the dependent variable.

For example, with the preceding study on aggression, you might state a null hypothesis that, in the population, there is no difference between the 0-gram, the 20-gram, and the 40-gram groups with respect to the mean number of aggressive acts performed. This null hypothesis could be represented symbolically in this way:

H0: M1 = M2 = M3

where M1 represents the mean level of aggression shown by the 0-gram group, M2 represents mean aggression shown by the 20-gram group, and M3 represents mean aggression shown by the 40-gram group.

When you analyze your data, SAS’s PROC GLM tests this null hypothesis by computing an F statistic. If the F statistic is sufficiently large (and the p value associated with the F statistic is sufficiently small), you can reject the null hypothesis. In rejecting the null, you tentatively conclude that, in the population, at least one of the three conditions differs from at least one other condition on the measure of aggression.

However, this leads to a problem: which pairs of treatment groups are significantly different from one another? There are various possibilities. Perhaps the 0-gram group is different from the 40-gram group but is not different from the 20-gram group. Perhaps the 20-gram group is different from the 40-gram group but is not different from the 0-gram group. Perhaps all three groups are significantly different from each other.

Faced with this problem, researchers routinely rely on multiple comparison procedures. These are statistical tests used in studies with more than two groups to help determine which pairs of groups are significantly different from one another. Several different multiple comparison procedures are available with SAS’s PROC GLM, including Duncan’s multiple-range test, the Scheffe test, and the Student-Newman-Keuls test. This chapter shows how to request and interpret Tukey’s studentized range test, sometimes called Tukey’s HSD test (“honestly significant difference”). The Tukey test is especially useful when the various groups in the study have unequal numbers of participants (which is often the case). For a description of the various multiple comparison tests that are available with PROC GLM, see the SAS/STAT User’s Guide.

Statistical Significance versus the Magnitude of the Treatment Effect

This chapter also shows how to calculate the R2 statistic from the output provided in an analysis of variance. In an ANOVA, R2 represents the percent of variance in the criterion that is accounted for by the predictor variable. In a true experiment, you can view R2 as an index of the magnitude of the treatment effect. It is a measure of the strength of the relationship between the predictor and the criterion. Values of R2 can range from 0 through 1, with values closer to 0 indicating a weak relationship between the predictor and criterion, and values closer to 1 indicating a stronger relationship.

For example, assume that you conduct the preceding study on aggression in children. If your independent variable (amount of sugar consumed by the children) has a very weak effect on the level of aggression displayed by the children, R2 is a small value, perhaps .02 or .04. On the other hand, if your independent variable has a very strong effect on their level of aggression, R2 will be a larger value, perhaps .20 or .40 (exactly how large R2 must be to be considered “large” depends on a number of factors that are beyond the scope of this chapter).

It is good practice to report R2 or some other measure of the magnitude of effect in research papers because researchers like to draw a distinction between results that are merely statistically significant versus those that are truly meaningful. The problem is that researchers frequently obtain results that are statistically significant but not meaningful in terms of the magnitude of the treatment effect. This is especially likely to happen when conducting research with very large samples. When the sample is very large (say, several hundred participants), you can obtain results that are statistically significant even though your independent variable has a very weak effect on the dependent variable. This occurs because many statistical tests become very sensitive to minor group differences when samples are large.

For example, imagine that you conduct the preceding aggression study with 500 children in the 0-gram group, 500 children in the 20-gram group, and 500 children in the 40-gram group. It is possible that you would analyze your data with a one-way ANOVA, and obtain an F value that is significant at p < .05. Normally, this might lead you to rejoice. But imagine that you then calculate R2 for this effect, and learn that R2 is only .03. This means that only 3% of the variance in aggression is accounted for by the amount of sugar consumed. Obviously, your manipulation has had a very weak effect. Even though your independent variable is statistically significant, most researchers would argue that it does not account for a meaningful amount of variance in children’s aggression.

This is why it is helpful to always provide a measure of the magnitude of the treatment effect (such as R2) along with your test of statistical significance. In this way, your readers can assess whether your results are truly meaningful in terms of the strength of the relationship between the predictor and criterion variables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.148.105