Now that we have our model ready, we can predict work satisfaction in the testing dataset.
One way to do so is simply to use the predict()
function. The allow.new.levels
argument specifies that we allow new hospitals in the analysis. As we have the same hospitals in the training and testing sets, we set its value to F
(false) (which is actually the default value):
NursesMLtest$predicted = predict(modelRS, NursesMLtest, allow.new.levels = F)
There is no perfect way to measure the quality of the predictions for nested data. A simple estimate of the quality of our prediction is the correlation test. Because of the nested structure of our dataset, we will perform the test for each hospital separately:
1 correls = matrix(nrow=17,ncol=3) 2 colnames(correls) = c("Correlation", "p value", "r squared") 3 for (i in 1:17){ 4 dat = subset(NursesMLtest, hosp == i) 5 correls[i,1] = cor.test(dat$predicted, dat$WorkSat)[[4]] 6 correls[i,2] = cor.test(dat$predicted, dat$WorkSat)[[3]] 7 correls[i,3] = correls[i,1]^2 8 } 9 round(correls, 3)
The output provided here shows some variation in the correlations, which are all significant at p < .05, except for hospital number 10. The third column displays the part of variance that is shared by the predictions and the observed values:
We can rely on multilevel analyses to test how well the predicted values are related to the observed values (the following model named modelPred
), and compare it to a null model (model called nullPred
). We will include random slopes, as the preceding correlations show some variation between hospitals in the data. Before we do that, we start by centering the predicted values:
1 NursesMLtest$predicted = NursesMLtest$predicted – 2 mean(NursesMLtest$predicted) 3 nullPred = lmer(WorkSat ~ 1 + (1|hosp), data=NursesMLtest, 4 REML = F) 5 modelPred = lmer(WorkSat ~ predicted + (1+predicted|hosp), 6 data=NursesMLtest, REML = F)
The output of the following line of code shows that modelPred
fits the data better than nullPred
:
anova(nullPred,modelPred)
The output of the following line shows that 27.99 percent of the variance of work satisfaction is accounted for by the prediction, which is in line with the model performance in the testing set. Whether this value is good or bad depends on the context:
r.squaredLR(modelPred,nullPred)
3.129.210.102