How to do it... 

In this recipe, we will use the robustlmm package to estimate robust linear mixed effects models. Its biggest advantage is that it uses the exact same syntax (and outputs) that the lmer function uses. We will evaluate what happens when we contaminate 5% of the data with abnormal values. As we will see, the lmer function suffers enormously with even a 5% contamination. The robustlmm package, on the contrary, does a great job, reporting coefficients almost as if no contamination had happened.

First, we generate data for 100 groups, containing 10 members each. Then, we generate random Gaussian deviates that are pasted to the dataset. Each group will share a common random shock that will render all the observations from that group to be correlated:

library(lme4) 
library(robustlmm) 
set.seed(10)  
X = 7*runif(1000)  
G = c()  
for (x in 1:100){ 
G = c(G,rep(x,10))
}  
pre____frame          = cbind(X=X,G=G,NOISE = rnorm(1000,0,0.03))  
shocks_frame          = cbind(G = 1:100,shocks = rnorm(100,0,1))  
merged_frame          = merge(pre____frame,shocks_frame,by="G")  
merged_frame$Y        = 10 + merged_frame$shocks + merged_frame$NOISE  
XYG                   = merged_frame[,c(1,2,5)]  
lmer(data=XYG, Y ~ 1 + (1|G)) 
rlmer(data=XYG, Y ~ 1 + (1|G))

As expected, both methods work well and yield similar results:

The interesting question is, what will happen when we add contamination?

Then, we contaminate 5% of the data with random noise with a large variance. This will change our data (and our lmer coefficients) greatly:

positions = sample(1:1000, 50, replace=T) 
XYG[positions,"Y"] = rnorm(1,50,10)

lmer suffers greatly, and the estimates are very distant from their true values. Nevertheless, the rlmer function does a great job and the coefficients look good.

The lmer function suffers especially for the variance components, as they are rather more than 10 times off their true values. This greatly changes our conclusions: actually the group effect explains 1/(1+0.3) = 76% of the variability of the model, which is a very different thing from what we would get if we did 11.95/(11.95+39.9)=23%. :

lmer(data=XYG, Y ~ 1 + (1|G)) 
rlmer(data=XYG, Y ~ 1 + (1|G))

The following is the resultant output:

Of course, the conclusions for the fixed effects would change (there is a 50% difference from what lmer gets versus what we should get)

We can plot the results and look at the resulting weights. As we know, the large residuals here all come from contaminated observations. The model does a great job at down-weighing them:

model = rlmer(data=XYG, Y ~ 1 + (1|G)) 
plot(model)

For fitted values versus residuals, black dots have a high weight, meaning that they are flagged as outliers:

Table of Contents for How to do it...&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...