Ensembles

At this point, we have trained five different models. The predictions are stored in two data frames, one for training and the other for the validation samples:

head(summary_models_train)
 ##    ID_RSSD Default          GLM RF            GBM              deep
 ## 4       37       0 0.0013554364  0 0.000005755001 0.000000018217172
 ## 21     242       0 0.0006967876  0 0.000005755001 0.000000002088871
 ## 38     279       0 0.0028306028  0 0.000005240935 0.000003555978680
 ## 52     354       0 0.0013898732  0 0.000005707480 0.000000782777042
 ## 78     457       0 0.0021731695  0 0.000005755001 0.000000012535539
 ## 81     505       0 0.0011344433  0 0.000005461855 0.000000012267744
 ##             SVM
 ## 4  0.0006227083
 ## 21 0.0002813123
 ## 38 0.0010763298
 ## 52 0.0009740568
 ## 78 0.0021555739
 ## 81 0.0005557417

Let's summarize the accuracy of the previously trained models. First, the predictive power of each classifier will be calculated using the Gini index. With the following code, the Gini index for the training and validation samples is calculated:

gini_models<-as.data.frame(names(summary_models_train[,3:ncol(summary_models_train)]))
colnames(gini_models)<-"Char"

for (i in 3:ncol(summary_models_train))
{
 
   gini_models$Gini_train[i-2]<-(abs(as.numeric(2*rcorr.cens(summary_models_train[,i],summary_models_train$Default)[1]-1)))
 
   gini_models$Gini_test[i-2]<-(abs(as.numeric(2*rcorr.cens(summary_models_test[,i],summary_models_test$Default)[1]-1)))
 
}

The results are stored in a data frame called gini_models. The variation in the predictive power between the train and test samples is also calculated:

gini_models$var_train_test<-(gini_models$Gini_train-gini_models$Gini_test)/gini_models$Gini_train
print(gini_models)

 ##   Char Gini_train Gini_test var_train_test
 ## 1  GLM  0.9906977 0.9748967     0.01594943
 ## 2   RF  1.0000000 0.9764276     0.02357242
 ## 3  GBM  1.0000000 0.9754665     0.02453348
 ## 4 deep  0.9855324 0.9589837     0.02693848
 ## 5  SVM  0.9920815 0.9766884     0.01551595

There are not really many significant differences between the models. The SVM is the model with the highest predictive power in the test sample. On the other hand, the deep learning model obtains the worst results.

These results indicate that it is not very difficult to find banks that will fail in less than a year from the current financial statement, which is how we defined our target variable.

We can also see the predictive power of each model, depending on the number of banks that are correctly classified:

decisions_train <- summary_models_train
 
decisions_test <- summary_models_test

Now, let's create new data frames where the banks are classified as solvent or non-solvent banks, depending on the predicted probabilities, as we have done for each model:

for (m in 3:ncol(decisions_train))
{
   
   decisions_train[,m]<-ifelse(decisions_train[,m]>0.04696094,1,0)
   
   decisions_test[,m]<-ifelse(decisions_test[,m]>0.04696094,1,0)
   
 }

Now, a function that counts the number of banks as correctly and non-correctly classified is created:

accuracy_function <- function(dataframe, observed, predicted)
{
 bads<-sum(as.numeric(as.character(dataframe[,observed])))
  goods<-nrow(dataframe)-bads
   y <- as.vector(table(dataframe[,predicted], dataframe[,observed]))
   names(y) <- c("TN", "FP", "FN", "TP")
  return(y)
 }

By running the preceding function, we will see a summary of the performance of each model. First, the function is applied on the training sample:

print("Accuracy GLM model:")
 ## [1] "Accuracy GLM model:"
accuracy_function(decisions_train,"Default","GLM")
 ##   TN   FP   FN   TP 
 ## 6584  174    9  324

print("Accuracy RF model:")
 ## [1] "Accuracy RF model:"
accuracy_function(decisions_train,"Default","RF")
 ##   TN   FP   FN   TP 
 ## 6608  150    0  333

print("Accuracy GBM model:")
 ## [1] "Accuracy GBM model:"
accuracy_function(decisions_train,"Default","GBM")
 ##   TN   FP   FN   TP 
 ## 6758    0    0  333

print("Accuracy deep model:")
 ## [1] "Accuracy deep model:"
accuracy_function(decisions_train,"Default","deep")
 ##   TN   FP   FN   TP 
 ## 6747   11  104  229

print("Accuracy SVM model:")
 ## [1] "Accuracy SVM model:"
accuracy_function(decisions_train,"Default","SVM")
 ##   TN   FP   FN   TP 
 ## 6614  144    7  326

Then, we can see the results of the different models in our test sample:

print("Accuracy GLM model:")
 ## [1] "Accuracy GLM model:"
accuracy_function(decisions_test,"Default","GLM")
 ##   TN   FP   FN   TP
 ## 2818   78    8  135

print("Accuracy RF model:")
 ## [1] "Accuracy RF model:"
accuracy_function(decisions_test,"Default","RF")
 ##   TN   FP   FN   TP
 ## 2753  143    5  138

print("Accuracy GBM model:")
 ## [1] "Accuracy GBM model:"
accuracy_function(decisions_test,"Default","GBM")
 ##   TN   FP   FN   TP
 ## 2876   20   15  128

print("Accuracy deep model:")
 ## [1] "Accuracy deep model:"
accuracy_function(decisions_test,"Default","deep")
 ##   TN   FP   FN   TP
 ## 2886   10   61   82

print("Accuracy SVM model:")
 ## [1] "Accuracy SVM model:"
accuracy_function(decisions_test,"Default","SVM")
 ##   TN   FP   FN   TP
 ## 2828   68    8  135

According to the table that was measured on the test sample, RF is the most accurate classifier of the failed banks, but this also misclassifies 138 solvent banks as failed, providing false alerts.

The results of the different models are correlated:

correlations<-cor(summary_models_train[,3:ncol(summary_models_train)], use="pairwise", method="pearson")
 
print(correlations)
 ##            GLM        RF       GBM      deep       SVM
 ## GLM  1.0000000 0.9616688 0.9270350 0.8010252 0.9910695
 ## RF   0.9616688 1.0000000 0.9876728 0.7603979 0.9719735
 ## GBM  0.9270350 0.9876728 1.0000000 0.7283464 0.9457436
 ## deep 0.8010252 0.7603979 0.7283464 1.0000000 0.7879191
 ## SVM  0.9910695 0.9719735 0.9457436 0.7879191 1.0000000

It might be interesting to combine the results of the different models to obtain a better model. Here, the concept of ensembles comes in handy. Ensemble is a technique that's used to combine different algorithms to make a more robust model. This combined model incorporates the predictions from all the base learners. The resulting model will have a higher level of accuracy than the accuracy that would be attained if the models were run separately. In fact, some of the previous models that we've developed are ensemble models, for example; the random forest or Gradient Boosting Machine (GBM). There are many options when creating an ensemble. In this section, we will look at different alternatives, from the simplest to those that are more complex.

Table of Contents for Ensembles

Create new playlist

Sign In

Sign Up

Table of Contents for
Ensembles