We will now do the nbBag implementation by executing the following code:
# setting up parameters to build svm bagging model
bagctrl <- bagControl(fit = nbBag$fit,
predict = nbBag$pred ,
aggregate = nbBag$aggregate)
# fit the bagged nb model
set.seed(300)
nbbag <- train(Attrition ~ ., data = mydata, method="bag", trControl = cvcontrol, bagControl = bagctrl)
# printing the model results
nbbag
This will result in the following output:
Bagged Model
1470 samples
30 predictors
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 1324, 1324, 1323, 1323, 1323, 1323, ...
Resampling results:
Accuracy Kappa
0.8389878 0.00206872
Tuning parameter 'vars' was held constant at a value of 44
We see that in this case, we achieved only 83.89% accuracy, which is slightly inferior to the KNN model's performance of 84%.
Although we have shown only three examples of the caret methods for bagging, the code remains the same to implement the other methods. The only change that is needed in the code is to replace the fit, predict, and aggregate parameters in bagControl. For example, to implement bagging with a neural network algorithm, we need to define bagControl as follows:
bagControl(fit = nnetBag$fit, predict = nnetBag$pred , aggregate = nnetBag$aggregate)
It may be noted that an appropriate library needs to be available in R for caret to run the methods, otherwise it results in error. For example, nbBag requires the klaR library to be installed on the system prior to executing the code. Similarly, the ctreebag function needs the party package to be installed. Users need to check the availability of an appropriate library on the system prior to including it for use with the caret bagging.
We now have an understanding of implementing a project through bagging technique. The next subsection covers the underlying working mechanism of bagging. This will help get clarity in terms of what bagging did internally with our dataset so as to produce better performance measurements than that of stand-alone model performance.