11

Between-Subjects Experimental Design and Analysis

Kim Bissell

ABSTRACT

Media effects and media studies researchers have recently been forced to determine the most effective way to measure the media's influence on individuals or groups. While thousands of media effects studies exist, the study design employed has varied. Between-subjects experimental design is increasingly becoming one of the most statistically valid ways to determine the effect of stimuli on a target population because every participant is only exposed to the treatment or stimulus once. This design allows researchers to make comparisons between groups based on each group's response to the stimulus material or treatment. This chapter provides a case study of a between-subjects experimental design using a sample of 601 children who participated in an experiment designed to measure weight stigmatization and bias following exposure to one image of an overweight or thin individual. Finally, the chapter discusses the advances and disadvantages to the design in this case and proposes directions for future research.

An experiment is a mode of observation that enables researchers to probe causal relationships. Many experiments in social research are conducted under the controlled conditions of a laboratory, but experimenters can also take advantage of natural occurrences to study the effects of events in the social world.

Babbie, 2010, p. 230

Experimental design is one of several quantitative research methods used in mass communication to study the relationship between mediated content and subjects' knowledge, beliefs, perceptions, and behavior. Wimmer and Dominick (2006) argue that experimental design is the oldest methodological approach used in mass communication research and certainly the most appropriate method for studying and understanding the effect of media content on audiences; however, the method itself is used relatively infrequently in mass media research (Wimmer & Dominick, 2006). Experiments have been used in media research to examine the following types of phenomena, along with many other topics: the effects of exposure to violent media on aggressive thoughts and behavior; the effects of exposure to highly sexualized content on attitudes about women as objects rather than subjects; the effects of negative political attack ads on voter attitudes toward candidates; the effects of health messages in advertisements on viewers' dieting behavior; the effects of video game playing on cognitive overload and secondary task behavior.

The beauty of an experiment is the ability to demonstrate causality; however, when examining social artifacts and human behavior, researchers' ability to state with certainty that variable X caused a change in variable Y or that the independent variable (IV) is what caused a change in the dependent variable (DV) is low.

The following chapter explores one form of experimental design – between-subjects experiments – and uses two projects as a case study of the advantages and disadvantages to this specific design. Through a discussion of the two experiments, the study design employed, the stimulus material used, and the sample selected, the strengths and weakness of this form of experiments will be examined.

In designing an experiment to determine the effect an independent variable has on some dependent variable, the researcher is faced with the choice of manipulating the stimulus (usually the independent variable) between or within subjects (Erle-bacher, 1977). Some researchers have tested the two designs to see if similar results can be found. The ultimate question in these earlier studies of the methodological soundness of each approach was whether the design type interacted with the independent variable (Schrier, 1958). Erlebacher (1977), also known for the development of a statistical test often used in experiments, the ANOVA (analysis of variance), addressed the use of a between- and within-subjects design and reported the following: “The design type (between or within-subjects) is treated as an explicit independent variable in a factorially designed experiment. The outcome of the experiments is that there is an interaction between the design type and the substantive independent variable” (p. 212). More simply, Erlebacher is suggesting that there are differences between the two designs, and for researchers, it is important to know which design would be most appropriate given the purported interaction between the design type and the independent variable. Most experiments in personality or social psychology tend to be conducted using a between-subjects design, and because so many of the studies within mass communication are derivatives or have origins within social psychology, experiments in mass communication and media studies have often followed suit.

Both within-subjects and between-subjects designs have their relative strengths and weaknesses. A between-subjects design allows for comparisons across at least two or more groups, whereby subjects randomly assigned into one or more experimental groups will be observed on the outcome variable or dependent variable following exposure to stimulus material. For example, if a researcher were interested in examining the physiological effects of exposure to scary content presented in 2-D and 3-D formats, the researcher could design a between-subjects experiment whereby participants assigned to one experimental group would view the scary content in 3-D format while physiological data was recorded (heart rate, skin conductance, EEG data), and participants assigned to another experimental group would view the same content in 2-D. A researcher in this case would most likely also include a control group who would not view any scary content at all so that baseline measures of the physiological data could be compared across groups. If statistical results indicated that participants in the 3-D group had a higher heart rate, increased skin conductance, and more EEG activity than participants in the 2-D or control group, the researchers could effectively assert that exposure to 3-D content caused an increase in physiological activity.

A within-subjects design is thought to be stronger in internal validity because the design is not dependent upon random assignment. In short, internal validity is the degree to which the study reflects or measures what it says it is measuring. In experiments, internal validity insures the design closely follows the principle of cause and effect. The statistical “power” from a within-subjects design – that is, the likelihood that a study will detect an effect when there is an effect to be detected – is also often greater than it is for between-subjects designs. Finally, the design, by its very nature, is more naturally aligned with what we think about when we consider experiments – a study designed to examine how particular stimuli might be related to a specific outcome in a subject (Charness, Gneezy, & Kuhn, 2012). While the benefits for this type of experimental design are great, the design does not easily allow for comparisons of multiple stimuli (as easily or without confounds), and external validity – the ability to make generalizations to a broader population – is not as high.

The choice of one type of design over another will be dependent upon a number of factors: (1) what research question or hypothesis is being tested; (2) what the concerns are over potentially spurious effects, whereby subjects act in accord with the way that satisfies the researcher's expectations; and (3) what the need is for statistical power (Charness et al., 2012). In any kind of empirical research, there are four key components that influence the degree to which conclusions can be reached from a statistical test. These factors are especially important in experimental design as all four are essential in properly designing a study in a way that will yield statistical significance. Statistical inference, or the ability to draw conclusions based on the statistical tests used, is dependent upon the sample size, the effect size, the significance level, and the statistical power (discussed above). If the sample size is small overall or within a specific cell in a between-subjects design, the likelihood of statistical significance being found decreases. Effect size is a numerical way of expressing the strength of a reported relationship. The significance level refers to criterion level used to reject the null hypothesis. All four factors are crucial in a researcher's ability to reject the null hypothesis and state that X caused Y.

What follows is a description of two between-subjects experimental design studies conducted over the last few years. These studies will be used to discuss the methodological strengths and weaknesses of the between-subjects approach and to highlight the ways media studies researchers can correct or overcome some of the limitations of this specific design.

Case Study: An Examination of Anti-Fat Bias in Grade School Children

Two separate studies were conducted over the course of three years to better understand the factors influencing the development of bias against overweight and obese individuals. However, the samples were different in each study, and the measures were modified based on the age of the participants. One of the studies was conducted with children and one with adults. Procedures were similar across the two studies, and the independent and dependent measures varied only in word usage and sentence length. These decisions were based on the age and literacy rate of each sample's participants. The objective of each of the studies was to identify correlates or precursors to weight bias. Thus, in each of the studies, a between-subjects experiment was conducted. With this design, I was able to prime participants with an image of an overweight or a thin subject and then use cognitive and attitudinal measures to assess participants' weight bias. The objective of both studies was to identify the factors – individual, social, media, or ideological – that might be the strongest predictors of bias in each of the two separate samples.

The first experiment, focusing on children, was designed to be a 2 × 5 between-subjects experiment. This study used four experimental groups and one control group. The first factor of the design was participant gender, and the second factor was the experimental group in which the subject was randomly assigned. Gender was included as a factor in the study because previous literature has indicated that girls have a greater bias than boys against other overweight girls. Thus, while one study objective was to determine the correlates or precursors of weight bias in children, another objective was to understand how the participants' gender interacted with the stimulus material. The 2 × 5 experiment yielded 10 conditions whereby boys and girls were randomly assigned to one of the five groups and then participant gender was used as a factor in statistical analysis.

In both studies, participants completed an online instrument and addressed items related to self-perception, peer and family influence, household dieting behavior, eating and exercise behavior, eating and exercise attitudes, and media exposure. These variables were largely used as control variables or as co-variates (a variable that may possibly also predict the outcome under investigation) with the independent variables (experimental group and gender). Participants also completed a modified version of the implicit association test (IAT), designed to measure attitudes related to weight. The IAT was developed by social psychologists Greenwald, McGhee, and Schwartz in 1998, and was designed to detect the strength of an individual's automatic association between mental representations of objects or people in memory. The IAT has been used in many experiments to gauge adults' implicit bias against gender, ethnicity, sexual orientation, and disability. While the IAT has been criticized by other social psychologists, it is still the most widely used measure of implicit bias against different personality and physical characteristics.

Children participating in the first study viewed and completed the instrument at school during computer lab time. For this study, child participants were randomly assigned into one of the five groups in the hope that there would be no differences across groups. The second study was an online experiment with adults, whereby participants were also assigned to one of the five groups. Using this randomized sorting method would guarantee that each of the five groups had roughly the same number of participants or a similar cell size. In the adult study, all items were viewed and completed online without any interaction with the researcher.

Each study included four experimental groups and one control group. In order to facilitate responses to photographs of overweight and thin children/adults – child participants viewed images of children and adult participants viewed images of adults – participants were randomly assigned one of the following groups: exposure to an image of an overweight female; exposure to an image of the same female who was thinner; exposure to an image of an overweight male; exposure to an image of the same male who was thinner; and no image exposure (control group). Thus participants were exposed to an image of an overweight man/boy or woman/girl in two of the four experimental groups. The purpose was to see if exposure to an overweight subject might trigger greater amounts of bias against overweight. By design, both male and female participants would be randomly assigned to one of the five groups, but because participant gender was a factor, a total of 10 groups were created.

Identification of the subjects/individuals used as stimuli in the two experiments was based on a pilot test of 16 images of thin and overweight males and females using a sample of other children and adults. The images selected for use in both projects were “before” pictures of males and females who had participated in weight-loss camps or weight-loss treatment programs. For the “overweight” treatment groups (one male, one female), full-body images were selected. Both photographs showed the subjects smiling so that no negative effect would be associated with facial expression. The subjects were photographed in bathing suits for both the “before” and the “after” images so that confounding factors could be eliminated. For the “thin” treatment groups, “after” images of the same male and female were used, and the images were enhanced in Photoshop so that the “thin” treatment group was exposed to an image of a clearly thin person. The fifth group did not view an image, and this group served as the control group. Experimental group was a key independent variable for both studies, as the subject in the stimulus photograph was selected in order to prime thoughts and attitudes about weight.

Since the outcome under investigation in both studies was anti-fat bias (AFB), two measures of AFB were used to try to better understand the correlates or precursors of weight bias. The dependent measures for both studies were similar: two measures of explicit bias. For the first measure, participants' attitudes about the subject viewed in the stimulus photograph were measured. Participants randomly assigned to one of the four treatment groups were asked seven questions related to the subject viewed in the photograph. Statements used in each experiment were items such as: “This girl/boy/woman/man looks like one of my friends”; “I think this girl/boy/woman/man is pretty/attractive”; “I think this girl/boy/woman/man is athletic”; and “I think this girl/boy/woman/man is popular.” Using responses to the statements, an additive scale was created that measured explicit bias against the subjects viewed in the photographs. Reliability of this scale across the four studies ranged from a.68 to a.90, with the higher reliability coming in the two studies conducted with adults.

The second measure of explicit bias was a modification of Crandall's (1994) anti-fat attitudes scale, using the subscale of dislike. Items used in this study were a modification of the following statements: “I really do not like fat people much”; “I don't have many friends that are fat”; “I tend to think that people who are overweight are a little untrustworthy”; “Although some fat people are surely smart, in general, I think they tend not to be quite as bright as normal weight people”; “I have a hard time taking fat people too seriously”; “Fat people make me somewhat uncomfortable.” This measure tapped into more general attitudinal bias against individuals who were overweight, as the items were not related specifically to the subject used in the stimulus. These types of statements gauge general bias against overweight versus a bias or dislike against a specific person who was viewed in the stimulus photograph. Some children may feel uncomfortable actively indicating a dislike for a specific person but may be more inclined to indicate a bias against more generic “fat people”; thus, the subscale of dislike provided the more general measure of anti-fat bias. It is acknowledged that viewing an overweight subject in the stimulus may have primed participants to think negatively about overweight individuals. However, the two scales measuring explicit anti-fat bias were separated by several filler questions and by the cognitive processing questions in each of the studies. Filler questions used in the study described included questions such as “What is your favorite subject in school?” “What is your favorite football team?” “Who is your favorite music artist?”

In a simple between-subjects experiment, it would be predicted that the stimulus would cause a change in participants' attitudes, perceptions, or beliefs about overweight individuals. Thus, the photographs for each experimental group were carefully selected so that increases or decreases in bias could be detected. This straightforward design would have allowed me to determine if variable X – the photograph of the overweight or thin subject – caused an increase in participant bias as I could have compared mean scores across the experimental groups to see which group exhibited the greatest bias. With this design, the control group would be used as the baseline measure of the dependent variable (Y). While this design would have been appropriate, I felt I could not ignore other variables that might also be related to participant bias against overweight others. In the hard sciences, the stimulus will either cause an outcome or not. But, in the social sciences, an observable change in that outcome variable could be attributed to a multitude of factors. That is why I chose to include other measures in both studies. These measures allowed me to control for other possible predictors of anti-fat bias and to determine with greater confidence whether a trait characteristic shown in a mediated context (maybe repeatedly) would result in an effect on participants' beliefs about that trait.

Since the two studies examined the possible individual, social, media, and ideological influences on the development of bias in children and adults, the following variables were also used as either a control or independent variable: cognitive processing styles, demographics, media exposure, exposure to thin ideal media, self-perception as it relates to body image, household dieting behavior, peer influence, and sociocultural attitudes about appearance. A more detailed description of how these variables were measured is provided in the following section. For all variables listed above, measures were similar; however, in some instances, sentences were worded more simply so that child participants could understand the statements.

Study 1 Participants

The participants in Study 1 were 601 boys and girls aged between 7 and 13 (a more detailed description of the measures used and the study findings can be found in Bissell & Hays, 2011). About 21% of the sample were in 3rd grade, 19% in 4th grade, 30% in 5th grade, and 24% in 6th grade. Of the 601 participants, 45% were boys and 55% were girls, with ethnic representation closely matched to that of the counties where data was collected: 80% of the sample was White, 16% African American, and the remaining 4% Hispanic, Asian, or “other.”

Study 1 Independent Variables

While many variables were used in the first study of the correlates of weight bias, the key factors under examination were related to individual variables, including gender and media exposure. Television viewing was measured using three self-report items. Children were asked to list the television shows they watched “yesterday before school,” “yesterday after school but before dinner,” and “yesterday after dinner but before bed.” Children were instructed to type the names of television shows they had watched during those time frames in the previous day.

As a measure of self-perception, participants were asked several questions related to their fears about their own appearance. Items used in this experiment were modeled after Lundgren, Anderson, and Thompson's (2004) fear of negative appearance evaluation scale (FNAE). Several statements were rewritten to accommodate participants' reading level. Furthermore, reliability tests for the full scale were not sufficient, so items were eliminated from the scale for better reliability. Examples of items included statements such as “I care about what my friends think of the way I look,” “I worry that other people will not like the way I look,” and “When I meet new people, I wonder what they will think about the way I look.”

Study 2 Participants

For Study 2, a design similar to Study 1 was used. In order to examine the relationship between possible predictors of implicit attitudes toward weight and obesity and the relationship of those implicit attitudes with more explicit attitudes of bias, a 2 × 5 design (gender × instrument version, between subjects) was used with adults in two states in the South. A total of 276 subjects ranging in age from 18 to 24 participated in the study. Approximately 67% of the sample was female and 33% male. A convenience sample was employed to recruit the 276 participants. Study 2 participants were recruited through participant pools at two universities, and students participated in the research project for extra credit.

Study 2 Independent Variables

All participants answered a series of questions related to media use, food intake, and involvement in physical activity and exercise. Participants then completed the implicit association test, designed to tap into implicit attitudes toward weight bias. Since participants completed this study online, a randomizing program was used to randomly sort each participant into one of the five groups (four experimental groups and one control group). Once assigned to one of five groups, participants in the experimental groups viewed one of four images (see stimulus description above) and then answered questions designed to measure explicit attitudes toward weight bias. Participants assigned to the control group did not view an image but answered similar questions related to weight bias.

After completing the first round of explicit measures of weight bias, all participants returned to the same section of the online instrument and completed other items related to self-discrepancy, exercise frequency, weight stigmatization, fear of negative appearance evaluations, and cognitive processing styles. Self-discrepancy items measured participants' perceived ideal self compared to their actual self, so participants were asked to circle a body shape that most closely represented their current or actual body shape and then (using a separate scale) were asked to circle the body shape they felt most closely represented their ideal body. Exercise frequency measures included participants' involvement in sports for competition or recreation and/or general exercise and the number of minutes per day they spent participating in a physical activity. Weight stigmatization involved a single question that asked participants if they were afraid of becoming overweight. For Study 2, participants' self-reported exposure to media was used, and a body shape index for each of the television programs viewed was created, so that a thin-ideal viewing index could be created. The body shape index was calculated using an entirely different sample of participants who rated the body shape and size of the primary characters in the television shows participants indicated they had viewed. This enabled me to have a measure of participant exposure to thin-ideal media so that the role of media exposure in the development or reinforcement of bias against overweight could be better understood.

Summary of Results

For Study 1, a statistical test was conducted to see if there was any variation on the dependent variable across the experimental groups, so a between-subjects test was conducted using only five groups (four viewing a version of the stimuli and one viewing no image at all). Statistical tests indicated that participants (whether male or female) who viewed the image of the overweight girl were more likely to assess her unfavorably than participants who viewed the image of the thin girl. More specifically, exposure to the overweight girl in the stimulus photograph primed or triggered negative attitudes toward her. Participants exposed to the image of the overweight girl were less likely to indicate she was pretty, smart, or friendly, and were less likely to say they would be friends with her. The second measure of weight bias was a more general assessment of negative attitudes toward overweight and, again, participants randomly assigned to the “overweight” treatment groups in the experiment were more likely to exhibit negative attitudes toward overweight. For this study, the statistical test comparing the differences between groups indicated that, as predicted, exposure to an overweight subject in a photograph resulted in more weight bias using the specific and general measures of the dependent variable.

When statistical tests were conducted that examined all 10 groups (2 × 5), results indicated that female participants who viewed the image of the overweight female were the most critical or exhibited the greatest bias against overweight. Female participants who viewed the image of the overweight boy had the second highest bias scores followed by the male participants who viewed the image of the overweight girl. Thus, participant gender did play a role in increased bias as the mean scores for female participants viewing the image of the overweight female had the highest AFB scores of all.

When the other variables were examined statistically with experimental version, interesting patterns emerged, suggesting that in social science, factors outside of the laboratory setting must be considered. For Study 1, media exposure and fear of negative appearance evaluations interacted with experimental group. More simply, the stimulus was found to cause a change in participant attitudes toward overweight others (for two of the four experimental groups – overweight male and female), but this effect varied when other independent variables such as thin-ideal media exposure and fear of negative appearance evaluations were included in the statistical test. The statistical test indicated that exposure to the stimulus photograph did prime or cue child participants to indicate a stronger bias against overweight individuals. However, for participants who indicated viewing higher amounts of thin-ideal television and for participants with a higher score on the FNAE scale, the bias against overweight was even greater. In short, two other variables that might also predict bias against overweight, coupled with exposure to an overweight subject, resulted in the greatest bias against overweight.

Study 2 was a between-subjects experiment with an adult sample using the same two measures of weight bias. The study design mirrored that used in Study 1, with the only difference being the complexity of some of the sentences. A one-way ANOVA test was run using the instrument version as the grouping variable and using the two explicit measures of bias. An ANOVA is a statistical test used when a researcher wants to look at simultaneous comparisons between three or more groups on the same dependent variable. This test requires a nominal variable as the independent variable, and an interval or ratio variable as the dependent variable. When just the experimental group was considered statistically, participants who viewed the image of the overweight woman had significantly less favorable attitudes toward the subject than those viewing the image of the thin woman. When an ANCOVA test was run to consider gender (and other demographic variables), similar results were found. An ANCOVA test is a statistical test that combines an ANOVA with a regression test. The test essentially looks for changes in the dependent variable based on a nominal grouping in the independent variable with covariates considered statistically.

Analysis was also conducted on the second measure of explicit bias, the more global measure of weight bias. One-way analysis of variance tests indicated significant differences between the five groups. As expected, participants in the control group exhibited greater anti-fat bias than those exposed to the image of the thin man or woman. Significant differences were found between the experimental groups viewing images of thin and overweight individuals and the control group and participants in the thin experimental groups. Thus, it was found that simple exposure to overweight subjects primed individuals to think more negatively about overweight.

Again, in the development of a between-subjects experiment, the analysis could stop at this point because causation had been found – the stimulus material was found to have an effect on participant attitudes about overweight others. However, it would be inappropriate to leave the results at that. For adults especially, the development of a bias against a trait characteristics whether it be weight, gender, age, ethnicity, or religion, is something that develops over time, and that bias certainly develops with the aid of other outside factors often found in the social world – media exposure, peer influence, and so on. Thus, Study 2 also looked at these other factors to see how or if they were related to increased or decreased levels of bias.

Results from a MANOVA (multiple analysis of variance test, used when there are two or more dependent variables) test examining the relationship of television exposure with experimental version on the two measures of explicit bias suggest there was an interaction effect between the two predictors and each dependent variable. When each dependent variable was considered, greater television viewing was related to higher levels of anti-fat bias; and when the experimental version was considered, participants exposed to the image of the overweight man and woman had even higher anti-fat bias than those who were exposed to the image of the thin man or woman. In order to include other possible predictor variables in the study of bias development, all-inclusive multiple regression (where the criterion is predicted by two or more variables) was used to combine four predictor variables – television viewing, IAT, and the two cognitive processing scales – experiential processing and rational processing. Taken together, the four variables were statistically significant predictors of AFB. Further analysis revealed that, from those four variables, the second cognitive scale measuring experiential processing and IAT were statistically significant predictors of AFB in the presence of the remaining three variables. The all-inclusive model explained 18% of the variance in anti-fat bias. Unexplained variance is an important consideration for mass communication researchers, and it remains one of the weaknesses of our research designs in general, and of experimental design specifically. This point will be addressed in greater detail below.

Findings from both studies suggest that the correlates of anti-fat attitudes may lie in a myriad of factors including individual and social factors, ideology or cultural norms, and media exposure. While the findings across the studies were not completely consistent with regard to the stronger predictors of bias, a few of the factors under investigation were significant across the two studies. The difference in findings could be attributable to the samples themselves – one used children and the other used adults – a design flaw, or because of the likelihood of adults or older children being less willing to admit bias in an overt way.

Of importance in both study findings was the media exposure factor. Whether considered via time spent with the media or by exposure to specific content (thin-ideal content, animated versus live-action programming), time spent viewing entertainment television was a significant predictor of negative attitudes toward overweight. This was found when considered with the experimental version and when considered in isolation. For example, when just the experimental conditions were tested with the dependent variables, a significant relationship was found suggesting that the manipulations did cause an effect in bias. However, when the manipulations (independent variable) were considered along with other variables such as media exposure, a causal relationship was still found as the amount of and type of media exposure also predicted greater weight bias. The social environment, which included household dieting behavior and self-perception as it relates to body image, was also a strong predictor of increased or decreased levels of bias in both studies.

While findings from both studies resulted in a desired outcome – the stimulus material had an effect on participant attitudes about overweight – neither study was without limitations. Furthermore, even with the desired outcome in each study, I cannot say with certainty that the results are a product of exposure to the stimuli, even though the procedures followed should allow me to come to that conclusion. Such is the nature of experimental design in media studies research. Even if a study has been carefully designed and implemented, statistically our results often look like nonresults to scholars outside of the discipline.

In media studies or mass communication research, the degree or amount of unexplained variance is often quite high. It would not be unusual to find a study reporting an unexplained variance at 75–85%. Unexplained variance is the part of the variance that cannot be attributed to specific causes. For example, in Study 2, I reported that, using four different factors, 18% of the variance was explained, which means that 82% was not. Quite frankly, I was pretty excited with the 18%. In a research world that deals with social factors, we simply cannot statistically control for everything. However, using the two studies described above, the goal is to identify ways in which error and study design problems can be addressed in advance.

Design Problems Specific to the Case Study

Procedures

Both studies used an online instrument containing the stimulus material and measures of independent and dependent variables. While it was convenient to use an online instrument, it meant that participants could easily skip ahead to questions or go back to other questions and change their answers. While the children were in a lab, the lab did not resemble a traditional research lab where extraneous noise and other distractions could be eliminated. Participants in this study sat next to other children and were able to view the stimulus image others were viewing. They quickly realized that they might be answering different questions, and this led to all kinds of possible corruption with the data. From a procedures perspective, it would have been much better if I could have collected data from each child participant individually. Had I been allowed to follow that procedure, however, I would probably still be collecting data as it took roughly one hour for each participant to complete the study. To correct for corruption of the data, I deleted all responses from participants who appeared to be too distracted to participate in the study. This reduced the N from over 1,000 to 601.

In the study with adults, participants were sent a link via email and could complete the study at any time and in any place. Again, while this might have been convenient for participants and the researcher, there were many factors outside of the lab environment that could not be controlled. For example, participants could have been viewing a television program containing many thin-ideal characters while participating in a research study about weight bias. If that were the case, the viewing of thin-ideal media could have been what caused increases in weight bias rather than viewing an image of an overweight subject. Furthermore, adult participants could have clicked forward and backwards through the instrument or could have hit the link to the instrument more than once, and possibly seen several different versions of the stimuli – there was no way to insure that only one image was viewed. The procedure did allow for data collection in a short period of time, but the nature of those procedures meant that there could easily be threats to the internal and external validity of the study.

Confounding Variables

For both studies, a pilot test was conducted to select the images used for the stimulus material. Despite this, it is possible that participants in both studies had a pre-existing bias against individuals of another gender, race, or age. For both studies, White subjects were selected, and four experimental groups were used so that two groups could use a male subject and two groups a female subject. The 2 in the 2 × 5 experiment meant that gender was one of two factors considered. Statistical tests indicated no bias based on in-group/out-group selection, meaning there were no statistically significant differences for female participants viewing the female subject or the male subjects viewing the male subject. However, an implicit bias against overweight females could still be present, even if it was not evident statistically. Finkelstein, Frautschy Demuth, and Sweeney (2007) report that in workplace and social settings, bias against overweight women is stronger than it is for overweight men. In a mediated context, overweight male characters are often portrayed as funny whereas overweight women are portrayed as unhappy, depressed, unable to obtain a date, and constantly dieting (Ata & Thompson, 2010). This bias against overweight females might be present on a societal or even ideological level and therefore be difficult to measure or control for statistically.

A confound is something that varies along with the independent variables that inhibit the researcher's ability to say what caused the variation in the dependent variable. Other possible confounds with the present study are variables not included as independent variables, covariates, moderating variables, or mediating variables. Media exposure, especially exposure to thin-ideal media content, is a possible confound. There are other factors that may have resulted in elevated levels of bias against overweight, but these intangible variables can be difficult to measure. For example, as with other forms of bias, stigma, or prejudice, bias against overweight is often learned. A child who lives in a household with a parent who speaks negatively about people of different races may learn to think negatively about that race. The same could be said about bias against overweight. The word “fat” has a negative connotation. If a child lives in a household where a parent, sibling, or other individuals use the word “fat” in a negative context, the child could learn to see fat in negative terms. Results from the IAT (in both studies) suggest that the majority of participants did associate overweight with negative adjectives, but I am unable to assess why. However, this implicit bias can certainly be correlated with explicit bias (as evidenced in both studies), and therefore merits a study limitation simply because the development of that bias is simply too difficult, if not impossible, to explain.

Wimmer and Dominick (2006, p. 235) state that researchers must “ensure the internal validity of their research by controlling for the effects of confounding variables (extraneous variables) that might contaminate their findings.” In theory, these confounding variables can be controlled through the environment, through experimental manipulations, through the design itself, or through the assignment of subjects. In practice, even if these procedures are implemented, it remains difficult to control for all possible extraneous variables that might have an influence on the dependent variable.

Random Selection

Participants in both studies were randomly assigned to one of five groups – four experimental groups and one control group. Random selection means that every person in the sample has an equal chance of selection for any of the groups. The beauty of randomization, beyond so many other factors, is that the variables to be controlled are distributed in approximately the same way in all groups. Simple random assignment is more challenging when dealing with studies outside of the lab environment. For example, child participants in Study 1 simply drew a number from a basket, and adult participants in Study 2 were randomly sorted into one of five groups based on a randomization program built into the study. The procedure used to randomize the stimulus photograph viewed by all participants was indeed random, so one might think that randomization means the study was probabilistic. Two key types of sampling are used in empirical studies–probability sampling and nonprobability sampling. Each has its own strengths and weaknesses. Because participants in the studies described were randomly assigned to experimental or control groups, one might assume the study was probabilistic. However, the original sample in both studies was a convenience sample, which means that the entire sample was a nonprobability sample and the results are not as generalizable as they might otherwise be.

Sample/Cell Size

While random sampling has many advantages, one issue with randomization is that if the sample size is small, there is a higher risk of the groups not being equivalent. For example, Study 1 had 601 participants in the final sample, but in order to look at factors such as participant gender, grade, and ethnicity, the sample size would have needed to be double the size in order for the sample size within each smaller cell to be appropriate. For example, the approximately 600 participants were divided into 10 groups (gender (2) × by instrument version (5)), leaving approximately 60 participants per cell. This cell size is large enough for statistical power; however, if I wanted to look at the responses by grade, that number would be divided by four, and I would wind up with a cell size that was too small to yield statistically significant results (n = 5). The lesson to be learned from this is to carefully consider the statistics that will be run in advance, and then think through the data analysis to insure an appropriate cell size is achieved.

Effect Size and Statistical Power

The power of the statistical test is determined by the significance criterion, the precision of the sample criterion, the precision of sample estimates, and the sample size (Sawyer & Ball, 1981). More simply, statistical power is the probability that a statistical test will correctly reject the null hypothesis. In media studies and mass communication research (and many other types of research), most readers understand that when p <. 05 is seen, it means there is less than a 5% chance that the researchers have made a Type 1 error (a false positive) in their conclusion. The higher the power, the greater the chance that the decision or assumption is correct. In order to achieve a higher power, the level of significance, the sample size, and the standard deviation of the population or the sample will be relevant. Study 2 included 276 participants randomly sorted into one of five groups and then that number was cut in half for the sort by gender. If the groups had been equal, a cell size of 27 would have been high enough for statistical power; however, because I had more female participants than male participants, my cell size for men across the five groups was too small to run statistical tests.

In addition to having to be concerned with statistical power, researchers using experiments must also consider a study's effect size. Effect size describes the magnitude or strength of a relationship between two or more variables in the population. If other design factors are controlled for properly, the greater the effect size, the greater the power (Sawyer & Ball, 1981). Effect size should be one of the first factors considered when designing a study. Effect size can also be explained via explained (or unexplained variance). With any data set and with any type of statistical test, variance (a characteristic of the data set and one that describes the range of responses within a data set) is of key importance because a researcher looking at data patterns or trends wants there to be diversity or a range in the responses but not so much range that no statistically significant differences can be found. The higher the explained variance relative to the total variance, the greater the effect of the variable in question. For example, if a researcher were to predict that exposure to thin-ideal media would cause greater anti-fat bias, the higher the explained variance, the more confidence the researcher would have in asserting that X caused Y. Given the nature of the discipline, our effects sizes are almost always small. Within the discipline, we know and understand that, but when our work is viewed by scholars from other disciplines, they might scoff at what we report to be a significant finding. While there is little we can do to control effect size or unexplained variance within our discipline, it is important to teach our graduate students, who often learn statistics in other disciplines, not to be disappointed with our comparatively small effect sizes.

Post-Test Only Data

Research methods books teach us about the variety of ways an experiment can be conducted. We hear about designs such as the Solomon four-group design, the pretest–post-test–control group design, and the post-test only design. In an ideal scientific world where humans are not involved, the Solomon four-group design or the pretest–post-test designs are the most appropriate ways to assess the effect of the independent variable on the dependent variable. These designs allow for a baseline measure on the dependent variable, therefore allowing the researcher to observe any change following exposure to the stimulus material. In a simple experiment, a researcher could be interested in determining whether exposure to violent television content would predict or cause aggressive thoughts in young children. If a very simple experiment were to be designed to test this, child participants would be randomly assigned to the experimental group who would view violent content or a control group who would view humorous content or no content at all. The researcher would measure aggressive thoughts in all participants following exposure to the content and then make comparisons between the two groups. If the theory predicted that violent content would likely produce or cause aggressive thoughts, the researcher could observe a greater incidence of aggressive thoughts from the experimental group than from those in the control group. In this case, control group participants act as the baseline measure of aggressive thoughts as they were not exposed to the content that would (hypothetically) produce aggressive thoughts.

The logistics of this design are another issue entirely. Again, when collecting data outside of a lab setting (or even in a lab setting), the likelihood of getting a participant to return to the study for a second round of data collection is low. The proportion of individuals who return to participate in a study is less than half, and if the study is long or the lab environment difficult or inconvenient to return to, the return rate is even lower. In a between-subjects design, pretest data would allow the researcher an opportunity to have a baseline measure of the dependent variable prior to exposure to the stimulus and then have a repeat measure of the dependent variable following exposure. While the objective of the between-subjects design is to compare responses on the dependent variable between groups, it is never a bad idea to be able to see if there is a change in the dependent variable via the within-subjects design – that is, by comparing a subject's pretest and post-test scores on the dependent variable. If this procedure could be followed, it would provide the researcher with a greater opportunity to say with certainty that exposure to the stimuli caused the specific outcome under investigation.

In neither study did I have pretest measures. In an ideal world, I would have collected pretest data, given participants a few days, and then had them return for the second part of the experiment. However, the logistics of my two studies did not allow for that, and thus, I can only report findings from the post-test study.

How Between-Subjects Designs Can Be Used in Media Studies Research

Media studies and mass communication researchers have a wealth of opportunity to utilize between-subjects experimental design as a means of assessing or determining the effect of specific media content on cognitive, affective, or behavior outcomes. Researchers interested in political communication might want to better understand how or if a political ad is more or less effective with a visual. A between-subjects design could be developed in a way that used a text-only ad, a visual-only ad, a text and visual ad, and a control group. If the researcher were interested only in the effect of variable X (the independent variable), on the dependent variable, attitudes toward a specific candidate, the straightforward design described above would be appropriate.

However, it is possible the researcher might realize that an individual's political orientation could also influence attitudes toward a specific candidate, regardless of the experimental group sampled into. In order to design this type of study, the researcher could design a 3 × 4 factorial design. The 3 would allow a researcher to group participants by political orientation (liberal, moderate, or conservative) or by political party (Republican, Democratic, or “other” – acknowledging that this factor could be much larger in size) and then by the four groups in the experiment (three experimental groups and one control group). While this type of design still would not be without its limitations, the simple design would allow for (1) comparisons across experimental groups, and (2) comparisons across experimental groups based on political orientation. The 3 × 4 factorial design allows for the comparison of two independent variables on a single dependent variable, and the proposed design would allow a researcher to statistically test the two IVs on the DV. This is just one of many examples of the way a study could be developed using between-subjects experiments.

As researchers, we are taught to spend some time reading through studies relevant to our interests, to acquire an understanding of the theoretical framework(s) appropriate for investigating our subject of interest, and from there develop a hypothesis or research question that predicts a relationship between two variables. The research question or hypothesis will dictate the type of design needed. A between-subjects design is a way of avoiding the carry-over effects of a within-subjects design, whereby with random assortment into all groups, the researcher can state with some authority that the stimuli had a specific effect on the dependent variable. The between-subjects design, sometimes called the independent measures design, is beneficial because it limits participants to a single treatment or exposure group. This type of design eliminates participants “learning” what is being asked of them, and it further eliminates participants dropping out or checking boxes out of boredom.

The underlying assumption of the between-subjects approach is that Group 1 and Group 2 are different levels of the same factor. Using the political advertising example above, the 3 × 4 factorial design meant that participants would be sorted by three groups of political orientation and sorts into four groups associated with the experiment (text ad, visual ad, text and visual ad, or no ad). If it is determined that a between-subjects design is the most appropriate for the study questions, it is prudent for researchers to then consider the following questions:

  1. What other variables might be related to the outcome variable?
  2. Is there enough variation between experimental groups that an effect can be found?
  3. Is there enough variation between the experimental and control groups that an effect can be found?
  4. How large must the sample size be in order for statistical significance to be found?
  5. What is the most appropriate method for randomizing participants into the respective groups?
  6. Have the necessary steps been taken to insure that participants within the group are similar, that is, there is an oversampling of women in one group and an oversampling of African Americans in another group?
  7. What types of statistics are appropriate given all of the variables being measured?
  8. Does the design allow the researcher to measure what he or she intends to measure?

Experiments are the primary tool for studying and understanding causal relationships. In media studies and mass communication research, causality is an important tool, as at least a third of the research in the field is related to the media's effect on audiences (McDonald, 2004). As Babbie (2010, p. 249) explains, “the chief advantage of a controlled experiment lies in the isolation of the experimental variable's impact over time.” Though many researchers are critical of experimental designs because of the artificiality associated with them, experiments remain a staple in mass communication research and media studies.

Between-subjects experimental designs are an important tool in scholars' wheel-house, and with knowledge about the limitations of the design within the discipline, solid, meaningful research can be conducted. Iyengar, Peters, and Kinder (1982) discussed experiments in the context of the role of television news programs on viewers' attitudes and beliefs. They report that the consequences of exposure are “not so minimal” and that earlier assumptions about media effects were indeed on the mark. This is something Lippmann alluded to almost a century ago. As Iyengar, Peters, and Kinder (1982, p. 855) conclude: “Fifty years and much inconclusive empirical fussing later, our experiments decisively sustain Lippmann's suspicious that media provide compelling descriptions of a public world that people cannot directly experience.”

Experimental design remains one of the most important methods in empirical research in mass communication and media studies. Researchers interested in media effects on a broad range of topics can use experiments to test the influence the media have in shaping individuals' knowledge, attitudes, and behavior. While they may be difficult to design, properly conducted experimental studies yield results that enable us to better understand the role the media play in our lives.

REFERENCES

Ata, R. N., & Thompson, K. J. (2010). Weight bias in the media: A review of recent research. European Journal of Obesity, 3(1), 41–46.

Babbie, E. (2010). The practice of social research. Belmont, CA: Wadsworth.

Bissell, K., & Hays, H. (2011). Understanding anti-fat bias in children: The role of media and appearance anxiety in third to sixth graders' implicit and explicit attitudes toward obesity. Mass Communication & Society, 14(1), 113–140.

Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Between-subject and within-subject design. Journal of Economic Behavior & Organization, 81, 1–8.

Crandall, C. S. (1994). Prejudice against fat people: Ideology and self-interest. Journal of Personality and Social Psychology, 66, 882–894.

Erlebacher, A. (1977). Design and analysis of experiments contrasting the within- and between-subjects manipulation of the independent variable. Psychological Bulletin, 84(2), 212–219.

Finkelstein, L. M., Frautschy Demuth, R. L., & Sweeney, D. L. (2007). Bias against overweight job applicants: Further explorations of when and why. Human Resource Managementt 46(2), 203–222.

Greenwald, A. G., McGhee, D. E., & Schwartz, J. K. L. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74, 1464–1480.

Iyengar, S., Peters, M. D., & Kinder, D. R. (1982). Experimental demonstrations of the “not-so-minimal” consequences of television news programs. American Political Science Review, 76(4), 848–858.

Lundgren, J. D., Anderson, D. A., & Thompson, J. K. (2004). Fear of negative appearance evaluation: Development and evaluation of a new construct for risk factor work in the field of eating disorders. Eating Behaviors, 5, 75–84.

McDonald, D. G. (2004). Twentieth-century media effects research. In J. D. H. Downing, D. McQuail, P. Schlesinger, & E. Wartella (Eds.), The Sage handbook of media studies (pp. 183–200). Thousand Oaks, CA: Sage.

Sawyer, A. G., & Ball, D. (1981). Statistical power and effect size in marketing research. Journal of Marketing Research, 18: 275–290.

Schrier, A. M. (1958). Comparison of two methods of investigating the effect of amount of reward on performance. Journal of Comparative and Physiological Psychology, 51, 725–731.

Wimmer, R. D., & Dominick, J. R. (2006). Mass media research: An introduction. Boston, MA: Wadsworth.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.229.92