Now that we have examined (laconically) the basics of multilevel modeling equations, we can turn to how to build multilevel models in R and predict unseen data.
For this purpose, we will first load our dataset produced using the same procedure as mentioned previously (except that the attributes are not scaled). Here again, there are 100 generated observations for each of the 17 hospitals:
NursesML = read.table("NursesML.dat", header = T, sep = " ")
We will examine the variation in our attributes considering hospitals and observations as a unit of analysis, that is, we will compare whether there is more variation at the hospital and observation levels. What we could do is compute this by hand.
The following will compute the mean for the attribute we want to predict (WorkSat
) for each of the hospitals:
means = aggregate(NursesML[,4], by=list(NursesML[,5]), FUN=mean)[2]
We can display the variance of work satisfaction in hospitals and observations as follows:
var(unlist(means)) #at the hospital level var(NursesML[,4]) #at the observation level
The output is 0.0771365
for hospitals and 0.7914461
for observations. Far more variance lies at the observation level than at the hospital level. Yet, the variance at the hospital level is present at the observation level and vice versa. The results are therefore not trustworthy.
Using multilevel modeling to examine such differences is the correct way to perform the comparison. In order to do it, we need to fit a multilevel model that only includes a constant and the clustering variable. This is known as the null
model. We will start by installing and loading the lme4
package. The lmer()
function in this package allows fitting multilevel models. The version download at the time of writing this text is 1.1-8. Your output could vary if you download a different version:
install.packages("lme4"); library(lme4)
We first tell R that the hosp
attribute is a factor:
NursesML$hosp = factor(NursesML$hosp)
We fit the null
model as follows:
null = lmer(WorkSat ~ 1 + (1|hosp), data=NursesML)
Let's examine the summary of the null
model:
summary(null)
Under Random effects
, we can see that Variance
for Intercept
is 0.06988
. This is the variance at the hospital level. The residual variance, that is, the variance at the observation level is 0.72564
. The total variance in work satisfaction is therefore 0.06988 + 0.72564 = 0.79552. We can compute the proportion of variance at the hospital level (known as the intraclass correlation) as 0.06988 / 0.79552 = 0.08784191. Approximately, 9 percent of variance lies at the hospital level. A rule of thumb is to consider that datasets with less than 5 percent of variance at level 2 (the hospital here) could be analyzed using traditional regression without being much concerned about the nesting of data. Note that this applies only if there are no predictors at level 2. Under Fixed effects
, we only have Intercept
here. Its value of 5.10679
in the null model means that the average value among observations is about 5.10
. We can compare this value with the value that is returned by the simple mean()
function:
mean(NursesML[,4])
The result, 5.106792
, is identical.
It is possible to obtain the intercept in each of the hospitals as follows:
coef(null)
We have already computed the mean at the hospital level (which is stored in the means object). We can therefore display those easily:
means
Both outputs are presented here (intercepts on the left). Note that we can notice minor differences due to the computations using lmer()
, because these are based on the distribution of the values in the hospitals:
The intercepts we display here are composed of a fixed part (the overall intercept) and a random part (one value per hospital), which correspond to the deviation of each hospital. The random part can be obtained using the ranef()
function:
ranef(null)
We will examine how to test the normality of residuals after we introduce random slopes.
We will now perform our first analysis for sep
. In this case, we want to examine the relative impact of personal accomplishment, depersonalization, and emotional exhaustion on work satisfaction. We will not yet include potential variation of the effect of the predictors between hospitals. It is common to center the predictors around the grand mean before running the analyses. We therefore prepared the following training and testing datasets, which contain 50 observations per hospital. You can now load the datasets:
NursesMLtrain = read.table("NursesMLtrain.dat", header = T, sep = " ") NursesMLtest = read.table("NursesMLtest.dat", header = T, sep = " ")
Let's make sure the hosp
attribute is considered a factor in both datasets:
NursesMLtrain$hosp = factor(NursesMLtrain$hosp) NursesMLtest$hosp = factor(NursesMLtest$hosp)
Let's now fit the model in the training data:
model = lmer(WorkSat ~ Accomp + Depers + Exhaust + (1|hosp), data=NursesMLtrain, REML = F)
The first thing we want to know is whether the model we just computed fits the data better than a null model. This requires comparing a value in both models: the -2loglikelihood
. As we now have a different dataset than when we computed the null model, we have to fit this model again. Another reason is that the comparison of -2loglikelihood
values is unreliable with restricted maximum likelihood (REML
, the estimator used by default in lmer()
). We will not explain this further and simply use maximum likelihood (ML) instead by stating REMPL = F
:
null = lmer(WorkSat ~ 1 + (1|hosp), data=NursesMLtrain, REML = F)
We can compare the -2loglikelihood
values using the
anova()
function (even though a chi-square test is actually performed), as for traditional regression models:
anova(null, model)
The output is as follows:
The AIC
and BIC
columns refer to the Akaike Information Criterion and Bayes Information Criterion. As for the -2loglikelihood value, these are measures of how well the data fits the model, but they take the model complexity (the number of included parameters) into account. The deviance column is also a measure of model fit. Smaller values are preferred for AIC, BIC and deviance, whereas the -
2loglikelihood values should increase with better fit. The Chisq column refers to the difference in -2loglikelihood between models (which closely follow a chi-square distribution). The degrees of freedom (next column) for the chi-square test are computed as the difference in degrees between the two models. Finally, the last column is p-value for the test. The three asterisks show that the model is significant at a value close to 0
, as displayed on the significance codes below the table. From this, we conclude that the model with the three predictors included is better than the null model.
We can compute the additional part of variance explained by our model using the r.squaredLR()
function from the MuMIn
package, which we first install and load. The r.squaredLR()
function takes our model with the predictors and the null model as arguments:
install.packages("MuMIn"); library(MuMIn) r.squaredLR(model,null)
The output shows that the (pseudo) R squared
value is 0.189
, meaning that our model predicted about 19 percent of the variance in the null model.
We can now examine the summary of the model as follows:
summary(model)
The output is as follows:
Unfortunately, lmer()
does not provide p values for the coefficients. Obtaining p values is often considered essential in hypothesis testing (although examining the confidence intervals is sometimes preferred). To obtain the p values, we need to perform their computation ourselves. Note that there is some debate as to the reliability of the computation of p values in multilevel modeling, which is why this is not included by default in the output. We start by extracting the t-values. We then compared these to a normal distribution and output them:
tvals = coef(summary(model))[,3] tvals.p <- 2 * (1 - pnorm(abs(tvals))) round(tvals.p,3)
The following output shows that the intercept, personal accomplishment, depersonalization, and emotional exhaustion are significantly different than 0
at p < .001
.
We can now examine Fixed effects
. We can see that at the mean level of each of the predictors (we are using centered attributes), the average work satisfaction when all predictors are at their average level is 5.11854
. An increase of one unit in personal accomplishment is related to an increase of 0.17611
in work satisfaction, whereas an increase of one unit in depersonalization and emotional exhaustion are related to a decrease of 0.07335
and 0.29215
. Of course, these are just estimates. The confidence intervals can be obtained using the
confint.merMod()
function:
confint.merMod(model)
The following output shows the true values with a 95 percent confidence. Examining whether the confidence intervals include 0 is another way of determining whether a predictor is significant. Notice that all predictors have confidence intervals that do not include 0 (meaning that they are significant). We will plot relationships between predictors and the criterion attribute in the next section. Note that the .sig01
and sigma
values refer to the standard deviations at level 2 (.sig01
) and level 1 (sigma
). Both are different from 0 (as 0 is not included in the confidence intervals):
In the previous model, we did consider common slopes for all hospitals. Now we want to draw conclusions on hospitals in general (the population), rather than on the hospitals in which we collected data. We therefore need to allow the slopes to vary between hospitals. Also, a visual inspection of the slopes in each hospital might warrant the inclusion of random slopes in the model in case of simple variations. This is the case in our data— we have presented the second figure here.
We, therefore, fit a new model with random slopes:
modelRS = lmer(WorkSat ~ Accomp + Depers + Exhaust + (1+Accomp+Depers+Exhaust|hosp), data=NursesMLtrain, REML = F)
We compare this model to the null model:
anova(null, modelRS)
The output (not provided here) shows that our last model fits the data better than the null model. We now examine the level 2 residuals for normality using the sjp.lmer()
function of the sjPlot
package:
install.packages("sjPlot"); library(sjPlot) sjp.lmer(modelRS, type = "re.qq")
As displayed in the following figure, the residuals for the intercept and each of the predictors are fairly normal, as almost points aligned to the normal distribution. Yet some deviation is observed, particularly for emotional exhaustion.
We perform the same operation for the level 1 residuals using the
qqnorm()
function:
qqnorm(resid(modelRS))
The following screenshot shows that the level 1 residuals are fairly normal as well:
Using the following code we observe that the variance explained by our model is around 19 percent:
r.squaredLR(model,null)
We can now examine our model in more detail:
summary(modelRS)
We notice that new values have appeared, notably the variance and standard deviations for the slopes of our three predictors at level 2, and the correlations between those (under Random effects
). We notice that the coefficients (under Fixed effects
) are also different.
Again, we test for the impact of the predictors on work satisfaction as follows:
tvals = coef(summary(modelRS))[,3] tvals.p <- 2 * (1 - pnorm(abs(tvals))) round(tvals.p,3)
The following output shows that all three predictors are significant at p < .05
:
We can also plot the slopes using the
plotLMER.fnc()
function from the language
package:
install.packages("languageR"); library(languageR) par(mfrow=c(1,3)) plotLMER.fnc(modelRS)
The three plots are presented as follows:
Remember that we centered our predictors. The 0 values on the x axis, therefore, refer to the mean of the predictors.
If you are curious about the impact of sampling on the estimates, you can run the following code and see what changes in the models we computed. You should find small differences each time you run the code! Just a heads up: use other model names as compared to those we used here. Otherwise, you will not find the same results in the following section:
#loading the initial dataset NursesML = read.table("NursesML.dat", header = T, sep = " ") NursesML$hosp = factor(NursesML$hosp) #creating the training and testing sets (50% in each) library(caret) trainObs = createDataPartition(NursesML[,5], p = .5, list=F) NursesMLtrain = NursesML[trainObs,] NursesMLtest = NursesML[-trainObs,] # grand mean centering the predictors for (i in 1:3){ NursesMLtrain[i] = NursesMLtrain[i]- colMeans(NursesMLtrain[i]) NursesMLtest[i] = NursesMLtest[i]- colMeans(NursesMLtest[i]) }
3.144.94.190