Background: Predicting a Criterion Variable from Multiple Predictors

The criterion (or predicted) variable in multiple regression is represented by the symbol Y, and is therefore often referred to as the “Y variable.” The predictor variables are represented as X1, X2, X3... Xn, and are referred to as the “X variables” (i.e., independent variables). The purpose of multiple regression is to understand the relationship between the Y variable and the X variables when taken as a group.

A Simple Predictive Equation

For example, consider again the model of the determinants of prosocial behavior presented in Figure 14.1. This model hypothesizes that the number of prosocial acts performed by an individual in a given period of time can be predicted by three variables:

  • the participant’s age;

  • the participant’s income;

  • the participant’s level of moral development.

Notice that each arrow (assumed causal path) in the figure is identified with either a “+” or a “–” sign. A plus sign indicates that you expect the relevant predictor variable to demonstrate a positive relationship with the criterion whereas a minus sign indicates that you expect the predictor to demonstrate a negative relationship. The nature of these signs in Figure 14.1 shows that you expect:

  • a positive relationship between age and prosocial behavior, meaning that older participants will perform more prosocial acts;

  • a positive relationship between income and prosocial behavior, meaning that more affluent people will perform more prosocial acts;

  • a positive relationship between moral development and prosocial behavior, meaning that participants who score higher on the paper-and-pencil measure of moral development will perform more prosocial acts.

Assume that you have administered a questionnaire to a sample of 100 participants to assess their level of moral development. (Scores on this scale can range from 10 to 100 with higher scores reflecting higher levels of development.) From the same participants, you have also obtained information regarding their age and income. You now want to use this information to predict the number of prosocial acts that the participant will perform in the next six months. More specifically, you want to create a new variable, Y′, that represents your best guess of how many prosocial acts participants will perform. Y′ represents your prediction of each participant’s standing on the criterion variable (as distinguished from Y, which is the participant’s actual standing on the criterion variable).

Assume that the three predictor variables are positively related to prosocial behavior. To arrive at a prediction of how many prosocial behaviors in which participants will engage, one of your options is to simply add together the participants’ scores on the three X variables and allow the sum to be your best guess of how many prosocial acts they will perform. You could do this using the following equation:

Y′ = X1 + X2 + X3

where:

Y′ =the participants’ predicted scores on “prosocial behavior”;
X1 =the participants’ actual scores on “age”;
X2 =the participants’ actual scores on “income” (in thousands);
X3 =the participants’ actual scores on the moral development scale.

To make this more concrete, consider the fictitious data presented in Table 14.1. This table presents scores for four of the study’s participants on the three predictor variables.

Table 14.1. Fictitious Data, Prosocial Behavior Study
 ParticipantAge
(X1)
Income (in thousands)
(X2)
Moral Development
(X3)
1.Lars191512
2.Hiro243228
3.Karim334550
.    
.    
.    
100.Eamonn556095

To arrive at an estimate of the number of prosocial behaviors in which the first participant (Lars) will engage, you could insert his scores on the three X variables into the preceding equation:

Y′ =X1 + X2 + X3
Y′ =19 + 15 + 12
Y′ =46

So your best guess is that the first participant, Lars, will perform 46 prosocial acts in the next six months. By repeating this process for all participants, you could go on to compute their Y′ scores in the same way. Table 14.2 presents the predicted scores on the prosocial behavior variable for some of the study’s participants.

Table 14.2. Predicted Scores on the Prosocial Behavior Variable Using a Simple Predictive Equation
ParticipantPredicted Scores On Prosocial Behavior
(Y')
Age
(X1)
Income (in thousands)
(X2)
Moral Development
(X3)
1.Lars46191512
2.Hiro84243228
3.Karin128334550
.     
.     
.     
100.Eamonn338556095

Notice the general relationships between Y′ and the X variables in Table 14.2. Because of the way Y′ was created, if participants have low scores on age, income, and moral development, your equation predicts that they will engage in relatively few prosocial behaviors. However, if participants have high scores on these X variables, your equation predicts that they will engage in a relatively large number of prosocial behaviors. For example, Lars had relatively low scores on these X variables, and as a consequence, the equation predicts that he will perform only 46 prosocial acts over the next six months. In contrast, Eamonn displayed relatively high scores on age, income, and moral development; as a consequence, your equation predicts that he will engage in 338 prosocial acts.

So, in effect, you have created a new variable, Y′. Imagine that you now go out and gather data regarding the actual number of prosocial acts that these participants engage in over the following six months. This variable would be represented with the symbol Y, because it represents the participants’ actual scores on the criterion (and not their predicted scores, Y′).

Once you determine the actual number of prosocial acts performed by the participants, you can list them in a table alongside their predicted scores on prosocial behavior as in Table 14.3:

Table 14.3. Actual and Predicted Scores on the Prosocial Behavior Variable
ParticipantActual Scores On Prosocial Behavior (Y)Predicted Scores On Prosocial Behavior (Y')
1.Lars1046
2.Hiro4084
3.Karim70128
.   
.   
.   
100.Eamonn130338

Notice that, in some respects, your predictions of the participants’ scores on Y are not terribly accurate. For example, your equation predicted that Lars would engage in 46 prosocial activities, but in reality he engaged in only 10. Similarly, it predicted that Eamonn would engage in 338 prosocial behaviors while he actually engaged in only 130.

Despite this, you should not lose sight of the fact that your new variable Y′ does appear to be correlated with the actual scores on Y. Notice that participants with low scores on Y′ (such as Lars) also tend to have low scores on Y; notice that participants with high scores on Y′ (such as Eamonn) also tend to have high scores on Y. If you compute a product-moment correlation (r) between Y and Y′, you would probably observe a moderately large correlation coefficient. This trend is supportive of your model as it suggests that there really is a relationship between Y and the three X variables when taken as a group.

The procedures (and the predictive equation) described in this section are somewhat crude in nature, and do not describe the way that multiple regression is actually performed. However, they do illustrate some important basic concepts in multiple regression analysis. For example, in multiple regression analysis, you create an artificial variable, Y′, to represent your best guess of participants’ standings on the criterion variable. The relationship between this variable and participants’ actual standing on the criterion (Y) is assessed to indicate the strength of the relationship between Y and the X variables when taken as a group.

Multiple regression as it is actually performed has many important advantages over the crude practice of simply adding together the X variables, as was illustrated in this section. With true multiple regression, the various X variables are multiplied by optimal weights before they are added together to create Y′. This generally results in a more accurate estimate of participants’ standing on the criterion variable. In addition, you can use the results of a true multiple regression procedure to determine which of the X variables are relatively important and which are relatively unimportant predictors of Y. These issues are discussed in the following section.

An Equation with Weighted Predictors

In the preceding section, each predictor variable was given approximately equal weight when computing scores on Y′. You did not, for example, give twice as much weight to income as you gave to age when computing Y′ scores. Assigning equal weights to the various predictors might make sense in some situations, especially when all of the predictors are equally predictive of the criterion.

However, what if some X variables in a model are better predictors of Y than others? For example, what if your measure of moral development displayed a strong correlation with prosocial behavior (r = .70), income demonstrated a moderate correlation with prosocial behavior (r = .40), and age demonstrated only a weak correlation (r = .20)? In a situation such as this, it would make sense to assign different weights to the different predictors. For example, you might assign a weight of 1 to age, a weight of 2 to income, and a weight of 3 to moral development. The predictive equation that reflects this weighting scheme is

Y′ = (1) X1 + (2) X2 + (3) X3

In this equation, once again X1 = age, X2 = income, and X3 = moral development. In calculating a given participant’s score on Y′, you would multiply his or her score on each X variable by the appropriate weight and sum the resulting products. For example, Table 14.2 showed that Lars had a score of 19 on X1, a score of 15 on X2, and a score of 12 on X3. His predicted prosocial behavior score, Y′, could be calculated in the following way:

Y' = (1) X1  + (2) X2  + (3) X3
Y' = (1) 19  + (2) 15  + (3) 12
Y' =     19  +     30  +     36
Y' =  85

This weighted equation predicts that Lars will engage in 85 prosocial acts over the next six months. You could use the same weights to compute the remaining participants’ scores on Y′ in the same way. Although this example is again somewhat crude in nature, it is this concept of optimal weighting that is at the heart of multiple regression analysis.

The Multiple Regression Equation

Regression Coefficients and Intercepts

In linear multiple regression as performed by the SAS PROC REG (regression) procedure, optimal weights of the sort described in the preceding section are automatically calculated in the course of the analysis. The following symbols are used to represent the various components of an actual multiple regression equation:

Y′ = b1 X1 + b2 X2 + b3 X3 ... + bk Xk + a

where:

Y′= participant’s predicted scores on the criterion variable;
bk= the nonstandardized multiple regression coefficient for the kth predictor variable;
Xk= the kth predictor variable;
a= intercept constant.

Some components of this equation, such as Y′, have already been discussed. However, some new components require additional explanation.

The term “bk” represents the nonstandardized multiple regression coefficient for an X variable. A multiple regression coefficient for a given X variable represents the average change in Y that is associated with a one-unit change in that X variable, while holding constant the remaining X variables. This somewhat technical definition for a regression coefficient is explained in more detail in a later section. For the moment, however, it is useful to think of a regression coefficient as revealing the amount of weight that the X variable is given when computing Y′. For this reason, these are sometimes referred to as b weights.

The symbol “a” represents the intercept constant of the equation. The intercept is a fixed value that is either added to or subtracted from the weighted sum of X scores when computing Y′. The inclusion of this constant in the regression equation improves the accuracy of prediction.

To develop a true multiple regression equation using PROC REG, it is necessary to gather data on both the Y variable and the X variables. Assume that you do this in a sample of 100 participants. You analyze the data, and the results of your analyses indicate that the relationship between prosocial behavior and the three predictor variables can be described by the following equation:

Y′= b1 X1+ b2 X2+ b3 X3+ a
Y′= (.10) X1+ (.25) X2+ (1.10) X3+ (-3.25)

The preceding equation indicates that your best guess of a given participant’s score on prosocial behavior can be computed by multiplying his or her age by .10, multiplying his or her income by .25, multiplying his or her score on moral development by 1.10 then summing these products and subtracting the intercept of 3.25 from this sum. This process is illustrated by inserting Lars’ scores on the X variables in the equation:

Y′= (.10) 19+ (.25) 15+ (1.10) 12+ (-3.25)
Y′= 1.9+ 3.75+ 13.2+ (-3.25)
Y′= 18.85+ (-3.25)  
Y′= 15.60   

Your best guess is that Lars will perform 15.60 prosocial acts over the next six months. You can calculate the Y′ scores for the remaining participants in Table 14.2 by inserting their X scores in this same equation.

The Principle of Least Squares

At this point, it is reasonable to ask, “How did PROC REG determine that the ‘optimal’ b weight for X1 was .10? How did it determine that the ‘optimal’ weight for X2 was .25? How did it determine that the ‘optimal’ intercept (“a” term) was –3.25?”

The answer is that these values are “optimal” in the sense that they minimize errors of prediction. An error of prediction refers to the difference between a participant’s actual score on the criterion (Y), and his or her predicted score on the criterion (Y′). This difference can be illustrated as follows:

Y – Y′

Remember that you must gather actual scores on Y in order to perform multiple regression so it is, in fact, possible to compute the error of prediction (the difference between Y and Y′) for each participant in the sample. For example, Table 14.4 reports the actual score for several participants on Y, their predicted scores on Y′ based on the optimally weighted regression equation above, and their errors of prediction.

Table 14.4. Errors of Prediction Based on an Optimally Weighted Multiple Regression Equation
ParticipantActual Scores On Prosocial Behavior
(Y)
Predicted Scores On Prosocial Behavior
(Y')
Errors of Prediction
(Y - Y')
1.Lars1015.60-5.60
2.Hiro4037.952.05
3.Karim7066.303.70
.    
.    
.    
100.Eamonn130121.758.25

For Lars, the actual number of prosocial acts performed was 10, while your multiple regression equation predicted that he would perform 15.60 acts. The error of prediction for Lars is therefore 10 – 15.60 or –5.60. The errors of prediction for the remaining participants are calculated in the same manner.

Earlier, it was stated that the b weights and intercept calculated by PROC REG are optimal in the sense that they minimize errors of prediction. More specifically, these b weights and intercept are computed according to the principle of least squares. The principle of least squares says that Y′ values should be calculated so that the sum of the squared errors of prediction are minimal. The sum of the squared errors of prediction can be calculated using this formula:

∑ (Y – Y′)2

To compute the sum of the squared errors of prediction according to this formula, it is necessary only to:

  • compute the error of prediction (Y – Y′) for a given participant;

  • square this error;

  • repeat this process for all remaining participants;

  • sum the resulting squares. (The purpose of squaring the errors before summing them is to eliminate the minus sign that some of the difference scores will display.)

When a given dataset is analyzed using multiple regression, PROC REG applies a set of formulae that calculates the optimal b weights and the optimal intercept for that dataset. The formulas that do this calculate b weights and an intercept that best minimize these squared errors of prediction. That is why we say that multiple regression calculates “optimal” weights and intercepts. They are optimal in the sense that no other set of b weights or intercept could do a better job of minimizing squared errors of prediction for the current dataset.

With these points established, it is now possible to summarize what multiple regression actually allows you to do:

Multiple regression allows you to examine the relationship between a single criterion variable and an optimally weighted linear combination of predictor variables.

In the preceding statement, “optimally weighted linear combination of predictor variables” refers to Y′. The expression “linear combination” refers to the fact that the various X variables are combined or added together (using the formula for a straight line), to arrive at Y′. The words “optimally weighted” refer to the fact that X variables are assigned weights that satisfy the principle of least squares.

Although we normally think of multiple regression as a procedure that examines the relationship between single criterion and multiple predictor variables, it also possible to view it in somewhat more simple terms: as a procedure that examines the relationship between just two variables, Y and Y′.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.252.193