How to do it...

Execute the following steps to create a stacked ensemble.

  1. Import the libraries:
import pandas as pd
from sklearn.model_selection import (train_test_split,
StratifiedKFold)
from sklearn import metrics
from sklearn.preprocessing import StandardScaler

from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
  1. Load and preprocess the data:
RANDOM_STATE = 42
k_fold = StratifiedKFold(5, shuffle=True, random_state=42)

df = pd.read_csv('credit_card_fraud.csv')

X = df.copy()
y = X.pop('Class')

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
  1. Define a list of classifiers to consider:
clf_list = [('dec_tree', DecisionTreeClassifier()),
('log_reg', LogisticRegression()),
('knn', KNeighborsClassifier()),
('naive_bayes', GaussianNB())]
  1. Iterate over the selected models, fit them to the data, and calculate recall using the test set:
for model_tuple in clf_list:
model = model_tuple[1]
if 'random_state' in model.get_params().keys():
model.set_params(random_state=RANDOM_STATE)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
recall = metrics.recall_score(y_pred, y_test)
print(f"{model_tuple[0]}'s recall score: {recall:.4f}")

Running the code results in the following output:

dec_tree's recall score: 0.7526
log_reg's recall score: 0.8312
knn's recall score: 0.9186
naive_bayes's recall score: 0.0588
  1. Define and fit the stacking classifier:
lr = LogisticRegression()
stack_clf = StackingClassifier(clf_list, 
                               final_estimator=lr,
                               cv=k_fold,
                               n_jobs=-1)
stack_clf.fit(X_train, y_train)
  1. Create predictions, and evaluate the stacked ensemble:
y_pred = stacking_clf.predict(X_test)
recall = metrics.recall_score(y_pred, y_test)
print(f"The stacked ensemble's recall score: {recall:.4f}")

The stacked ensemble's recall score is 0.9398.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.205.205