Introduction: Answering Questions with Multiple Regression

Multiple regression is a highly flexible procedure that allows researchers to address many different types of research questions with many different types of data. Perhaps the most common multiple regression analysis involves a single continuous criterion variable measured on an interval or ratio scale and multiple continuous predictor variables also assessed on an interval or ratio scale.

For example, you might be interested in determining the relative importance of variables that are believed to predict adult income. To conduct your research, you obtain information for 1,000 Canadian adults. The criterion variable in your study is annual income for these participants. The predictor variables are age, years of education, and income of parents. In this study, the criterion variable as well as the predictor variables are each continuous and are all assessed on an interval or ratio scale. Because of this, multiple regression is an appropriate data analysis procedure.

Analysis with multiple regression allows you to answer a number of research questions. For example, it allows you to determine:

  • whether there is a significant relationship between the criterion variable and the multiple predictor variables, when examined as a group;

  • whether the multiple regression coefficient for a given predictor variable is statistically significant (this coefficient represents the amount of weight given to a specific predictor, while holding constant the other predictors);

  • whether a given predictor accounts for a significant amount of variance in the criterion, beyond the variance accounted for by the other predictors.

By conducting the preceding analyses, you learn about the relative importance of the predictor variables included in your multiple regression equation. Researchers conducting nonexperimental research in the social sciences are often interested in learning about the relative importance of naturally occurring predictor variables. This chapter shows how to perform such analyses.

Because multiple regression is such a flexible procedure, there are many other types of regression analyses that are beyond the scope of this chapter. For example, in the study dealing with annual income that was discussed earlier, all predictor variables were continuous on an interval or ratio scale. Nominal (classification) variables might also be used as predictors in a multiple regression analysis provided that they have been appropriately transformed using dummy-coding or effect-coding. Because this chapter provides only an introduction to multiple regression, it does not cover circumstances in which nominal-scale variables are included as predictors. Also, this chapter covers only those situations in which a linear relationship exists between the predictor variables and the criterion; curvilinear and interactive relationships are not discussed.

Once you learn the basics of multiple regression from this chapter, you can learn more about advanced regression topics (e.g., dummy-coding nominal variables or testing curvilinear relationships) in Cohen, Cohen, West, and Aiken (2003), or Pedhazur (1982).

Multiple Regression versus ANOVA

Chapters 9 through 13 presented analysis of variance (ANOVA) procedures that are commonly used to analyze data from experimental research: research in which one or more categorical independent variables (such as “experimental condition”) are manipulated to determine how they affect a study’s dependent variable.

For example, imagine that you were interested in studying prosocial behavior: actions intended to help others. Examples of prosocial acts might include donating money to the poor, donating blood, doing volunteer work at a hospital, and so forth. You might have developed an experimental treatment that you believe will increase the likelihood that people will engage in prosocial acts. To investigate this, you conduct an experiment in which you manipulate the independent variable (e.g., half of your participants are given the experimental treatment and the other half are given a placebo treatment). You then assess your dependent variable, the number of prosocial acts that the participants later perform. It would be appropriate to analyze data from this study using one-way ANOVA, because you had a single criterion variable assessed on an interval/ratio scale (number of prosocial acts) as you had a single predictor variable measured on a nominal scale (experimental group).

Multiple regression is similar to ANOVA in at least one important respect: with both procedures, the criterion variable should be continuous and should be assessed on either an interval- or ratio-level of measurement. Chapter 1 indicates that a continuous variable is one that can assume a relatively large number of values. For example, the “number of prosocial acts performed over a six-month period” can be a continuous variable, provided that participants demonstrate a wide variety of scores (e.g., 0, 4, 10, 11, 20, 25, 30).

However, multiple regression also differs from ANOVA in some ways. The most important difference involves the nature of the predictor variables. When data are analyzed with ANOVA, the predictor variable is a categorical variable (i.e., a variable that simply codes group membership). In contrast, predictor variables in multiple regression are generally continuous.

As an illustration, assume that you conduct a study in which you administer a questionnaire to a group of participants to assess the number of prosocial acts each has performed. You can then proceed to obtain scores for the same participants on each of the following predictor variables:

  • age;

  • income;

  • a questionnaire-type scale that assesses level of moral development.

Perhaps you have a hypothesis that the number of prosocial acts performed is causally determined by these three predictor variables, as illustrated by the model in Figure 14.1:

Figure 14.1. A Model of the Determinants of Prosocial Behavior


You can see that each predictor variable in your study (i.e., age, income, and moral development) is continuous, and is assessed on an interval or ratio scale. This, combined with the fact that your criterion variable is also a continuous variable assessed on an interval/ratio scale, means that you can analyze your data using multiple regression.

This is a most important distinction between the two statistical procedures. With ANOVA, the predictor variables are always categorical; with multiple regression, they are generally continuous (“generally” because categorical variables are sometimes used in multiple regression provided they have been dummy-coded or effect-coded). For more information, see Cohen, Cohen, West, and Aiken (2003) or Pedhazur (1982).

Multiple Regression and Naturally Occurring Variables

Multiple regression is particularly well-suited for studying the relationship between naturally occurring predictor and criterion variables; that is, variables that are not manipulated by the researcher, but are simply measured as they naturally occur. The preceding prosocial behavior study provides good examples of naturally occurring predictor variables: age; income; and level of moral development.

This is what makes multiple regression such an important tool in the social sciences. It allows researchers to study variables that cannot be experimentally manipulated. For example, assume that you have a hypothesis that domestic violence (an act of aggression against a domestic partner) is caused by the following:

  • childhood trauma experienced by the abuser;

  • substance abuse;

  • low self-esteem.

The model illustrating this hypothesis appears in Figure 14.2. It is obvious that you would not experimentally manipulate the predictor variables of the model and later observe the participants as adults to see if the manipulation affected their propensity for domestic violence. However, it is possible to simply measure these variables as they naturally occur and determine whether they are related to one another in the predicted fashion. Multiple regression allows you to do this.

Figure 14.2. A Model of the Determinants of Domestic Violence


Does this mean that ANOVA is only for the analysis of manipulated predictor variables, while multiple regression is only for the analysis of naturally occurring variables? Not necessarily, because naturally occurring variables can be predictor variables in an ANOVA, provided they are categorical. For example, ANOVA can be used to determine whether participant sex (a naturally occurring predictor variable) is related to relationship commitment (a criterion variable). In addition, a categorical manipulated variable (such as “experimental condition”) can be used as a predictor variable in multiple regression, provided that it has been dummy-coded or effect-coded. The main distinction to remember is this: with ANOVA, the predictor variables can only be categorical variables; with multiple regression, the predictor variables can be either categorical or continuous.

How large must my sample be?

Multiple regression is a large-sample procedure; unreliable results might be obtained if the sample does not include at least 100 observations, preferably 200. The greater the number of predictor variables included in the multiple regression equation, the greater the number of participants that will be necessary to obtain reliable results. Most experts recommend at least 15 to 30 participants per predictor variable. See Cohen (1992).


“Proving” Cause and Effect Relationships

Multiple regression can determine whether a given set of variables is useful for predicting a criterion variable. Among other things, this means that multiple regression can be used to determine the following:

  • whether or not the relationship between the criterion variable and predictor variables (taken as a group) is statistically significant;

  • how much variance in the criterion is accounted for by the predictors;

  • which predictor variables are relatively important predictors of the criterion.

Although the preceding section often refers to causal models, it is nonetheless important to remember that the procedures discussed in this chapter do not provide evidence concerning cause and effect relationships between predictor variables and the criterion. For example, consider the causal model presented in Figure 14.2. Assume that naturally occurring data for these four variables are gathered and analyzed using multiple regression. Assume further that the results are significant: that multiple regression coefficients for all three of the predictor variables are significant and in the predicted direction. Even though these findings are consistent with your theoretical model, they do not “prove” that these predictor variables have a causal effect on domestic violence. Because the data are correlational, there is probably more than one way to interpret the observed relationships among the four variables. The most that you would be justified in saying is that your findings are consistent with the causal model portrayed in Figure 14.2. It would be inappropriate to say that the results “prove” that the model is correct.

Then why analyze correlational data with multiple regression at all? There are several reasons. Very often, researchers are not really interested in testing a causal model. Perhaps the purpose of the study is simply to understand relationships among a set of variables.

However, even when the research is based on a causal model, multiple regression can still be useful. For example, if you obtained correlational data relevant to the domestic violence model of Figure 14.2, analyzed it, and found that none of the multiple regression coefficients were significant, this would still be useful information because it showed that the model failed to survive a test (i.e., it failed to survive an analysis that investigated the predicted relationships between the variables).

If the model does survive an analysis with multiple regression (i.e., if significant results are obtained), this is useful as well. You can prepare a research report indicating that the results were consistent with the hypothesized model. In other words, the model survived an attempt at disconfirmation. If you are dealing with predictor variables that can be manipulated ethically, you might choose next to conduct an experiment to determine whether the predictors appear to have a causal effect on the criterion variable under controlled circumstances.

In summary, it is important to remember that multiple regression procedures discussed in this chapter provide virtually no evidence of cause and effect relationships. However, they are extremely useful for determining whether one set of variables can significantly and meaningfully predict variation in a criterion. Use of the term prediction in multiple regression should therefore not be confused with causation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.154.16