Spearman’s rank-order correlation coefficient (symbolized as rs) can be appropriate in a variety of circumstances. First, you can use Spearman correlations when both variables are assessed at an ordinal level of measurement. It is also correct when one variable is an ordinal-level variable, and the other is an interval- or ratio-level variable.
However, it can also be appropriate to use the Spearman correlation when both variables are on an interval or ratio scale. This is because the Spearman coefficient is a distribution-free test. Among other things, a distribution-free test is one that makes no assumptions concerning the shape of the distribution from which sample data were drawn. For this reason, researchers sometimes compute Spearman correlations when one or both of the variables are interval- or ratio-level but are markedly non-normal (i.e., skewed or kurtotic), such that a Pearson correlation would be inappropriate. The Spearman correlation is less useful than a Pearson correlation when both variables are normally distributed, but it is more useful when one or both variables are not normally distributed.
SAS computes a Spearman correlation by rank-ordering both variables and computing the correlation between the ranks. The resulting correlation coefficient can range from –1.00 through to +1.00, and is interpreted in the same way as a Pearson correlation coefficient.
Here is the general form for computing a Spearman correlation between two variables. Notice that this is identical to the form used to compute Pearson correlations except that you specify the option SPEARMAN in the PROC CORR statement. (If you did not specify the SPEARMAN option, the program would have again produced Pearson correlations since Pearson correlations are the default output.)
PROC CORR DATA=dataset-name SPEARMAN options; VAR variable1 variable2; RUN;
To illustrate this statistic, assume that a teacher has administered a test of creativity to 10 students at Time 1. After reviewing the results, she ranks her students from 1 to 10, with “1” representing the “most creative student,” and “10” representing the “least creative student.” Two months later, at Time 2, she repeats the process, arriving at a slightly different set of rankings. She now wants to determine the correlation between her rankings made at Time 1 and Time 2. The data (rankings) are clearly ordinal level measures, so the correct statistic is the Spearman rank-order correlation coefficient.
This is the entire program that will input the fictitious data and compute the Spearman correlation:
1 DATA D1; 2 INPUT #1 @1 TEST1 2. 3 @4 TEST2 2. ; 4 DATALINES; 5 1 2 6 2 3 7 3 1 8 4 5 9 5 4 10 6 6 11 7 7 12 8 9 13 9 10 14 10 8 15 ; 16 RUN; 17 18 PROC CORR DATA=D1 SPEARMAN; 19 VAR TEST1 TEST2; 20 RUN;
This program provides the output presented as Output 6.6. Notice that the format is identical to that observed with the Pearson correlations in Output 6.4 except that the heading above the matrix of correlations indicates that Spearman correlations have been printed. Correlation coefficients and significance estimates (i.e., p values) are interpreted in the usual way.
The CORR Procedure 2 Variables: TEST1 TEST2 Simple Statistics Variable N Mean Std Dev Median Minimum Maximum TEST1 10 5.50000 3.02765 5.50000 1.00000 10.00000 TEST2 10 5.50000 3.02765 5.50000 1.00000 10.00000 Spearman Correlation Coefficients, N = 10 Prob > |r| under H0: Rho=0 TEST1 TEST2 TEST1 1.00000 0.91515 0.0002 TEST2 0.91515 1.00000 0.0002 |
When requesting Spearman correlations, the VAR and WITH statements are used in the same way as when computing Pearson correlation coefficients. That is:
using only the VAR statement results in the printing of all possible correlations for the listed variables;
combining the VAR with the WITH statement results in the printing of correlations for subsets of variables;
leaving off these statements results in the printing of all possible correlations for all numeric variables.
3.17.62.34