Computing Bivariate Correlations with PROC CORR

In most studies in which data are analyzed using multiple regression, it is appropriate to begin by computing all possible correlations among the study’s variables. Reviewing these correlations will help you understand the big picture concerning the simple relationships between the criterion variable and the four predictor variables as well as among the four predictor variables themselves.

Writing the Program

Following is the general form for the SAS statements that will compute all possible correlations among the variables being investigated. (Because detailed information concerning use of the CORR procedure is in Chapter 6, most of that information is not repeated here.)

PROC CORR   DATA=dataset-name;
   VAR  criterion-variable-and-predictor-variables ;
RUN;

Technically, the variables can be presented in any sequence though there are some advantages to (a) listing the criterion variable first and (b) then listing the predictor variables in the same order that they are discussed in the text. This often makes it easier to interpret the results of the correlation matrix.

Below is the entire program (including a portion of the fictitious data) that would compute all possible correlations among the five variables of interest in the current study. Output 14.1 presents the results produced by this program.

 1      DATA D1;
 2         INPUT   #1    @1   COMMITMENT      2.
 3                       @4   SATISFACTION    2.
 4                       @7   REWARD          2.
 5                       @10  COST            2.
 6                       @13  INVESTMENT      2.
 7                       @16  ALTERNATIVES    2.   ;
 8
 9          IF REWARD NE . AND COST   NE . AND INVESTMENT NE . AND
10             ALTERNATIVES NE . AND COMMITMENT NE . ;
11
12       DATALINES;
13       34 30 25 13 25 12
14       32 27 27 14 32 13
15       34 23 24 21 30 14
16       .
17       .
18       .
19       .
20       36 32 28 13 15 5
21       32 29 30 21 32 22
22       30 32 33 16 34 9
23       ;
24       RUN;
25
26       PROC CORR DATA=D1;
27          VAR COMMITMENT REWARD COST INVESTMENT ALTERNATIVES;
28       RUN;

Output 14.1. Results of the CORR Procedure
                                      The CORR Procedure

       5  Variables:    COMMITMENT   REWARD       COST         INVESTMENT   ALTERNATIVES


                                      Simple Statistics

Variable               N          Mean       Std Dev           Sum       Minimum       Maximum

COMMITMENT            48      27.70833      10.19587          1330       4.00000      36.00000
REWARD                48      26.64583       5.05076          1279       9.00000      33.00000
COST                  48      16.58333       5.49597     796.00000       5.00000      26.00000
INVESTMENT            48      25.33333       6.08218          1216      11.00000      34.00000
ALTERNATIVES          48      16.60417       7.50529     797.00000       4.00000      34.00000


                          Pearson Correlation Coefficients, N = 48
                                  Prob > |r| under H0: Rho=0

                    COMMITMENT        REWARD          COST      INVESTMENT      ALTERNATIVES

  COMMITMENT           1.00000       0.57597      -0.24826         0.61403          -0.72000
                                      <.0001        0.0889          <.0001            <.0001

  REWARD               0.57597       1.00000      -0.45152         0.57117          -0.46683
                        <.0001                      0.0013          <.0001            0.0008

  COST                -0.24826      -0.45152       1.00000         0.02143           0.27084
                        0.0889        0.0013                        0.8851            0.0626

  INVESTMENT           0.61403       0.57117       0.02143         1.00000          -0.44776
                        <.0001        <.0001        0.8851                            0.0014

  ALTERNATIVES        -0.72000      -0.46683       0.27084        -0.44776           1.00000
                        <.0001        0.0008        0.0626          0.0014

Interpreting the Results of PROC CORR

1. Make Sure That the Numbers Look Right

Before interpreting the meaning of the correlation coefficients, it is important to review descriptive statistics to help verify that no errors were made when entering data or writing the INPUT statement. Most of this information is presented in a table of simple statistics that appears at the top of the output page.

The simple statistics table at the top of Output 14.1 provides means, standard deviations, and other descriptive statistics for the five variables analyzed. Note that N = 48 for all six variables, meaning that there must have been missing data for two of the 50 participants in the original dataset. (The two with missing data were deleted by the subsetting IF statement discussed earlier; review the actual dataset in Appendix B to verify that two participants did, in fact, have missing data.)

The rest of the descriptive statistics should also be reviewed to verify that all of the figures are reasonable. In particular, check the “Minimum” and “Maximum” columns for evidence of problems. The lowest score that a participant could possibly receive on any variable was 4; if any variable displays a minimum score below 4, an error must have been made either when entering the data or in the program. Similarly, no score in the “Maximum” column should exceed 36.

2. Determine the Size of the Sample Producing the Correlations

When all correlations are based on the same number of participants, the sample size for the correlations should appear on the line just below the table of simple statistics. In Output 14.1, this reads as

Pearson Correlation Coefficients, N = 48
        Prob > |r| under H0: Rho=0

The final entry in this line tells us that N = 48 for the analysis. If the various correlations had, instead, been based on different sample sizes, then the N associated with each coefficient would appear within the table of correlations coefficients, just below the p value for the corresponding correlation.

3. Review the Correlation Coefficients

The correlations among the five variables appear in the 5 × 5 matrix in the bottom half of Output 14.1. In this matrix, where the row for one variable intersects with the column of a second variable, you find the cell that provides information about the correlation between these variables. The top figure in the cell is the Pearson correlation coefficient between the variables; the bottom figure is the p value associated with this coefficient.

For example, consider the vertical column under the “COMMITMENT” heading. Where the column headed “COMMITMENT” intersects with the row headed “REWARD,” you find that the correlation between commitment and rewards is approximately .58. The p value associated with this correlation is less than .01, meaning that there is less than 1 chance in 100 of obtaining a coefficient of this size by chance alone.

Reviewing the correlations in the COMMITMENT column tells you something about the pattern of simple bivariate correlations between commitment and the four predictors. Notice that COMMITMENT demonstrates a positive correlation with REWARD and INVESTMENT. As you would expect, participants who reported higher levels of rewards and investment size also reported higher levels of commitment. It can also be seen that COMMITMENT demonstrates negative relationships with COST and ALTERNATIVES. This is also as you would expect since participants who report higher levels of costs and alternative values report lower levels of commitment. Each correlation is in the direction predicted by the investment model (see Figure 14.7).

However, notice that some predictors are more strongly related to COMMITMENT than others. Specifically, ALTERNATIVES displays the strongest correlation at –.72, followed by INVESTMENT (.61) and REWARD (.58)[1] . Each correlation is statistically significant at the .01 level.

[1] Note, we cannot necessarily conclude that the difference between these coefficients is statistically significant without calculating Fishers’ Z transformations (see Hopkins, Glass, & Hopkins, 1987).

The correlation between COMMITMENT and COST is much lower at approximately –.25. In fact, this correlation is not statistically significant; the p value associated with it is .09. Based on these results, you cannot reject the null hypothesis that commitment and costs are uncorrelated in the population.

The correlations in Output 14.1 also provide important information about the correlations among the four predictor variables. When using multiple regression, an ideal predictive situation is typically one in which each predictor variable displays a relatively strong correlation with the criterion while the predictor variables display relatively weak correlations among themselves. (The reasons for this are discussed in an earlier section.)

With this in mind, the correlations presented in Output 14.1 indicate that the association among variables in the current dataset is less than ideal. It is true that three of the predictors display relatively strong correlations with commitment, but it is also true that most of the predictors display relatively strong correlations with each other. Notice that r = –.45 for the correlation between REWARD and COST, that r = .57 for REWARD and INVESTMENT, and r = –.47 for REWARD and ALTERNATIVES. These correlations are moderately large as is the correlation between INVESTMENT and ALTERNATIVES. Moderate intercorrelations of this sort sometimes result in nonsignificant multiple regression coefficients for at least some predictor variables when the data are analyzed. The analyses reported in the following section should bear this out.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.23.114.100