How to do it...

In this recipe, we will generate random vectors according to a bivariate Gaussian distribution. We will contaminate it and see how both the standard and the robust methods work.

  1. First, we will generate bivariate Gaussian numbers and then estimate the correlation matrix. Since we don't have any outliers, both the robust and non-robust methods should be roughly similar:
library(MASS)
library(robust)
Sigma <- matrix(c(2,1,1,2),2,2)
d <- mvrnorm(n = 1000, mu=c(5,5), Sigma)
covClassic(d,cor = TRUE)

The following screenshot shows the correlation matrix estimated via classic methods

  1. The robust correlation matrix looks perfect as well. This is interesting, as it proves that the robust method works very well, in the absence of any contamination:
cov.rob(d,cor = TRUE)

Robust correlation matrix estimation. We first get the centers of the distribution:

The following screenshot shows estimated robust correlation matrix:

  1. We then add contamination to 5% of the data. These are quite abnormal values, having very different means (and covariance) from the data that we have used so far. The standard way starts to have issues, as the correlation matrix now has substantial deviations:
d[1:50,1:2] = matrix(rnorm(100,20,10),c(50,2))
covClassic(d,cor = TRUE)

The following screenshot shows the estimated correlation matrix with 5% of contamination:

  1. The robust way has no problem. Both the centers and the correlation matrix are still very close to where they should be. It seems like the contamination did not occur at all. The centers and the correlation matrix are still very close to their true values:
cov.rob(d,cor = TRUE)

The following screenshot shows the estimated robust center of the distribution with 5% contamination:

The following screenshot shows the estimated robust correlation matrix with 5% correlation:

  1. Finally, we add a contamination to 20% of the data (200 observations), and use the classical approach:
d[1:200,1:2] = matrix(rnorm(400,20,10),c(50,2))
covClassic(d,cor = TRUE)

The following screenshot shows the estimated classical correlation matrix with 20% contamination:

  1. The robust approach is again much better, yielding estimates within 10% of their true values:
cov.rob(d,cor = TRUE)

The following screenshot shows the estimated robust centers with 20% contamination:

The following output shows the Estimated robust correlation matrix with 20% contamination:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.226.109