Building attrition prediction model with stacking

Let's build an attrition prediction model with stacking:

# loading the required libraries and registering the cpu cores for multiprocessing 
library(doMC) 
library(caret) 
library(caretEnsemble) 
registerDoMC(cores=4) 
# setting the working directory and loading the dataset 
setwd("~/Desktop/chapter 15") 
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv") 
# removing the non-discriminatory features from the dataset as identified in EDA step 
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL 
# setting up control paramaters for cross validation 
control <- trainControl(method="repeatedcv", number=10, repeats=10, savePredictions=TRUE, classProbs=TRUE) 
# declaring the ML algorithms to use in stacking 
algorithmList <- c('C5.0', 'nb', 'glm', 'knn', 'svmRadial') 
# setting the seed to ensure reproducibility of the results 
set.seed(10000) 
# creating the stacking model 
models <- caretList(Attrition~., data=mydata, trControl=control, methodList=algorithmList) 
# obtaining the stacking model results and printing them 
results <- resamples(models) 
summary(results)

This will result in the following output:

summary.resamples(object = results) 

Models: C5.0, nb, glm, knn, svmRadial  
Number of resamples: 100  

Accuracy  
               Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's 
C5.0      0.8082192 0.8493151 0.8639456 0.8625833 0.8775510 0.9054054    0 
nb        0.8367347 0.8367347 0.8378378 0.8387821 0.8424658 0.8435374    0 
glm       0.8299320 0.8639456 0.8775510 0.8790444 0.8911565 0.9387755    0 
knn       0.8027211 0.8299320 0.8367347 0.8370763 0.8438017 0.8630137    0 
svmRadial 0.8287671 0.8648649 0.8775510 0.8790467 0.8911565 0.9319728    0 

Kappa  Min.          1st Qu.     Median     Mean   3rd Qu.      Max.  NA's 
C5.0   0.03992485 0.29828006 0.37227344 0.3678459 0.4495049 0.6112590    0 
nb     0.00000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000    0 
glm    0.26690604 0.39925723 0.47859218 0.4673756 0.5218094 0.7455280    0 
knn   -0.05965697 0.02599388 0.06782465 0.0756081 0.1320451 0.2431312    0 
svmRadial 0.24565 0.38667527 0.44195662 0.4497538 0.5192393 0.7423764    0 

# Identifying the correlation between results 
modelCor(results)

This will result in the following output:

We can see from the correlation table results that none of the individual ML algorithm predictions are highly correlated. Very highly correlated results mean that the algorithms have produced very similar predictions. Combining the very similar predictions may not really yield significant benefit compared with what one would avail from accepting the individual predictions. In this specific case, we can observe that none of the algorithm predictions are highly correlated so we can straightforwardly move to the next step of stacking the predictions:

# Setting up the cross validation control parameters for stacking the predictions from individual ML algorithms 
stackControl <- trainControl(method="repeatedcv", number=10, repeats=10, savePredictions=TRUE, classProbs=TRUE) 
# stacking the predictions of individual ML algorithms using generalized linear model 
stack.glm <- caretStack(models, method="glm", trControl=stackControl) 
# printing the stacked final results 
print(stack.glm)

This will result in the following output:

A glm ensemble of 2 base models: C5.0, nb, glm, knn, svmRadial 
Ensemble results: 
Generalized Linear Model  
14700 samples 
    5 predictors 
    2 classes: 'No', 'Yes'  
No pre-processing 
Resampling: Cross-Validated (10 fold, repeated 10 times)  
Summary of sample sizes: 13230, 13230, 13230, 13230, 13230, 13230, ...  
Resampling results: 
  Accuracy   Kappa     
  0.8844966  0.4869556

With GLM-based stacking, we have 88% accuracy. Let's now examine the effect of using random forest modeling instead of GLM to stack the individual predictions from each of the five ML algorithms on the observations:

# stacking the predictions of individual ML algorithms using random forest 
stack.rf <- caretStack(models, method="rf", trControl=stackControl) 
# printing the summary of rf based stacking 
print(stack.rf)

This will result in the following output:

A rf ensemble of 2 base models: C5.0, nb, glm, knn, svmRadial 
Ensemble results: 
Random Forest  
14700 samples 
    5 predictors 
    2 classes: 'No', 'Yes'  
No pre-processing 
Resampling: Cross-Validated (10 fold, repeated 10 times)  
Summary of sample sizes: 13230, 13230, 13230, 13230, 13230, 13230, ...  
Resampling results across tuning parameters: 
  mtry  Accuracy   Kappa     
  2     0.9122041  0.6268108 
  3     0.9133605  0.6334885 
  5     0.9132925  0.6342740 
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 3.

We see that without much effort, we were able to achieve an accuracy of 91% by stacking the predictions. Now, let's explore the working principle of stacking.

At last, we have discovered the various ensembling techniques that can provide us with better performing models. However, before ending the chapter, there are a couple of things we need to take a note of.

There is not just one way to implement ML models in R. For example, bagging can be implemented using functions available in the ipred library and not by using caret as we did in this chapter. We should be aware that hyperparameter tuning forms an important part of model building to avail the best performing model. The number of hyperparameters and the acceptable values for those hyperparameters vary depending on the library that we intend to use. This is the reason why we paid less attention to hyperparameter tuning in the models we built in this chapter. Nevertheless, it is very important to read up the library documentation to understand the hyperparameters that can be tuned with a library function. In most cases, incorporating hyperparameter tuning in models significantly improves the model's performance.

Table of Contents for Building attrition prediction model with stacking

Create new playlist

Sign In

Sign Up

Table of Contents for
Building attrition prediction model with stacking