Execute the following steps to create a stacked ensemble.

Import the libraries:

import pandas as pd
from sklearn.model_selection import (train_test_split, 
                                     StratifiedKFold)
from sklearn import metrics
from sklearn.preprocessing import StandardScaler

from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression

Load and preprocess the data:

RANDOM_STATE = 42
k_fold = StratifiedKFold(5, shuffle=True, random_state=42)

df = pd.read_csv('credit_card_fraud.csv')

X = df.copy()
y = X.pop('Class')

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Define a list of classifiers to consider:

clf_list = [('dec_tree', DecisionTreeClassifier()),
            ('log_reg', LogisticRegression()),
            ('knn', KNeighborsClassifier()),
            ('naive_bayes', GaussianNB())]

Iterate over the selected models, fit them to the data, and calculate recall using the test set:

for model_tuple in clf_list:
    model = model_tuple[1]
    if 'random_state' in model.get_params().keys():
        model.set_params(random_state=RANDOM_STATE)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    recall = metrics.recall_score(y_pred, y_test)
    print(f"{model_tuple[0]}'s recall score: {recall:.4f}")

Running the code results in the following output:

dec_tree's recall score: 0.7526
log_reg's recall score: 0.8312
knn's recall score: 0.9186
naive_bayes's recall score: 0.0588

Define and fit the stacking classifier:

lr = LogisticRegression()
stack_clf = StackingClassifier(clf_list, 
                               final_estimator=lr,
                               cv=k_fold,
                               n_jobs=-1)
stack_clf.fit(X_train, y_train)

Create predictions, and evaluate the stacked ensemble:

y_pred = stacking_clf.predict(X_test)
recall = metrics.recall_score(y_pred, y_test)
print(f"The stacked ensemble's recall score: {recall:.4f}")

The stacked ensemble's recall score is 0.9398.

Table of Contents for
How to do it...

How to do it...

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...