Bagged classification and regression trees (treeBag) implementation

To begin, load the essential libraries and register the number of cores for parallel processing:

library(doMC) 
registerDoMC(cores = 4)
library(caret)
#setting the random seed for replication
set.seed(1234)
# setting the working directory where the data is located
setwd("~/Desktop/chapter 15")
# reading the data
mydata <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")
#removing the non-discriminatory features identified during EDA
mydata$EmployeeNumber=mydata$Over18=mydata$EmployeeCount=mydata$StandardHours = NULL
#setting up cross-validation
cvcontrol <- trainControl(method="repeatedcv", repeats=10, number = 10, allowParallel=TRUE)
# model creation with treebag , observe that the number of bags is set as 10
train.bagg <- train(Attrition ~ ., data=mydata, method="treebag",B=10, trControl=cvcontrol, importance=TRUE)
train.bagg

This will result in the following output:

Bagged CART  
1470 samples
30 predictors
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 1324, 1323, 1323, 1322, 1323, 1322, ...
Resampling results:
Accuracy Kappa
0.854478 0.2971994

We can see that we achieved a better accuracy of 85.4% compared to 84% accuracy that was obtained with the KNN algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.178