Implementing an attrition prediction model with random forests

Let's get our attrition model through random forest modeling by executing the following code:

# loading required libraries and registering multiple cores to enable parallel processing 
library(doMC)
library(caret)
registerDoMC(cores=4)
# setting the working directory and reading the dataset
setwd("~/Desktop/chapter 15")
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
# removing the non-discriminatory features from the dataset as identified during EDA step
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL
# setting the seed for reproducibility
set.seed(10000)
# setting the cross validation parameters
fitControl = trainControl(method="repeatedcv", number=10,repeats=10)
# creating the caret model with random forest algorithm
caretmodel = train(Attrition~., data=mydata, method="rf", trControl=fitControl, verbose=F)
# printing the model summary
caretmodel

This will result in the following output:

Random Forest  

1470 samples
30 predictors
2 classes: 'No', 'Yes'

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 1323, 1323, 1324, 1323, 1324, 1322, ...
Resampling results across tuning parameters:

mtry Accuracy Kappa
2 0.8485765 0.1014859
23 0.8608271 0.2876406
44 0.8572929 0.2923997

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 23.

We see the best random forest model achieved a better accuracy of 86% compared to KNN's 84%.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.42.243