Implementing an attrition prediction model with random forests

Let's get our attrition model through random forest modeling by executing the following code:

# loading required libraries and registering multiple cores to enable parallel processing 
library(doMC) 
library(caret) 
registerDoMC(cores=4) 
# setting the working directory and reading the dataset 
setwd("~/Desktop/chapter 15") 
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv") 
# removing the non-discriminatory features from the dataset as identified during EDA step 
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL 
# setting the seed for reproducibility 
set.seed(10000) 
# setting the cross validation parameters 
fitControl = trainControl(method="repeatedcv", number=10,repeats=10) 
# creating the caret model with random forest algorithm 
caretmodel = train(Attrition~., data=mydata, method="rf", trControl=fitControl, verbose=F) 
# printing the model summary 
caretmodel

This will result in the following output:

Random Forest  

1470 samples 
  30 predictors 
   2 classes: 'No', 'Yes'  

No pre-processing 
Resampling: Cross-Validated (10 fold, repeated 10 times)  
Summary of sample sizes: 1323, 1323, 1324, 1323, 1324, 1322, ...  
Resampling results across tuning parameters: 

  mtry  Accuracy   Kappa     
   2    0.8485765  0.1014859 
  23    0.8608271  0.2876406 
  44    0.8572929  0.2923997 

Accuracy was used to select the optimal model using the largest value. 
The final value used for the model was mtry = 23.

We see the best random forest model achieved a better accuracy of 86% compared to KNN's 84%.

Table of Contents for Implementing an attrition prediction model with random forests

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing an attrition prediction model with random forests