Predictions using multilevel models

Now that we have our model ready, we can predict work satisfaction in the testing dataset.

Using the predict() function

One way to do so is simply to use the predict() function. The allow.new.levels argument specifies that we allow new hospitals in the analysis. As we have the same hospitals in the training and testing sets, we set its value to F (false) (which is actually the default value):

NursesMLtest$predicted = predict(modelRS, NursesMLtest,
   allow.new.levels = F)

Assessing prediction quality

There is no perfect way to measure the quality of the predictions for nested data. A simple estimate of the quality of our prediction is the correlation test. Because of the nested structure of our dataset, we will perform the test for each hospital separately:

1  correls = matrix(nrow=17,ncol=3)
2  colnames(correls) = c("Correlation", "p value", "r squared")
3  for (i in 1:17){
4     dat = subset(NursesMLtest, hosp == i)
5     correls[i,1] = cor.test(dat$predicted, dat$WorkSat)[[4]]
6     correls[i,2] = cor.test(dat$predicted, dat$WorkSat)[[3]]
7     correls[i,3] = correls[i,1]^2
8  }
9  round(correls, 3)

The output provided here shows some variation in the correlations, which are all significant at p < .05, except for hospital number 10. The third column displays the part of variance that is shared by the predictions and the observed values:

Assessing prediction quality

We can rely on multilevel analyses to test how well the predicted values are related to the observed values (the following model named modelPred), and compare it to a null model (model called nullPred). We will include random slopes, as the preceding correlations show some variation between hospitals in the data. Before we do that, we start by centering the predicted values:

1  NursesMLtest$predicted = NursesMLtest$predicted –
2  mean(NursesMLtest$predicted)
3  nullPred = lmer(WorkSat ~ 1 + (1|hosp), data=NursesMLtest,
4  REML = F)
5  modelPred = lmer(WorkSat ~ predicted + (1+predicted|hosp),
6     data=NursesMLtest, REML = F)

The output of the following line of code shows that modelPred fits the data better than nullPred:

anova(nullPred,modelPred)

The output of the following line shows that 27.99 percent of the variance of work satisfaction is accounted for by the prediction, which is in line with the model performance in the testing set. Whether this value is good or bad depends on the context:

r.squaredLR(modelPred,nullPred)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.102