How to do it...

Whenever we refer to Spearman, we mean Spearman's rank statistic; and whenever we refer to Pearson, we mean the classical correlation coefficient.

In the following example, we will use a nonlinear example with no noise (in this case, Spearman's coefficient will be almost 1), and Pearson's coefficient will be high but lower. We will then introduce some noise and explain why Pearson's ends up being higher:

First, we generate some synthetic data. This is clearly a nonlinear relationship, as shown in the following example:

x = seq(1,100)
y = 20/(1+exp(x-50))
plot(x,y)

The preceding command displays the following output:

We now compute both the Spearman and Pearson correlation coefficients; both are negative since there is a clear negative relationship.

cor.test( ~ x + y, method = "spearman",conf.level = 0.95)
cor.test( ~ x + y, method = "pearson",conf.level = 0.95)

The preceding command displays the following output of the Spearman versus Pearson coefficients:

Because Spearman measures if the relationship is monotone (and in this case, every single observation satisfies that with respect to the previous one), it is basically equal to -1. Of course, the relationship is not linear, and that's why Pearson is -0.88, as shown in the preceding example.

Now, let's add some noise, as shown in the following example:

x = seq(1,100)
y = sapply(x,function(x){(runif(1)-0.5)*10 + 20/(1+exp(x-50))})
plot(x,y)

The preceding command displays the following output:

Now, as we can see here, Pearson's coefficient is almost the same as before, but Spearman's is reduced substantially. Intuitively, this happens because, if we had to plot a line that goes through those points, it wouldn't have changed at all due to the extra noise (so Pearson's would have stayed the same). But Spearman's coefficient is quite different, it captures how monotone the relationship is; if each variable behaves more erratically, even though the general shape is still nonlinear, we wouldn't be able to say that the relationship is monotone. This is why Spearman can be seen as how much deviation we have from a perfect case where the relationship is 100% monotone:

cor.test( ~ x + y, method = "spearman",conf.level = 0.95)
cor.test( ~ x + y, method = "pearson",conf.level = 0.95)

The preceding command displays the following output:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...