Execute the following steps to create a stacked ensemble.
- Import the libraries:
import pandas as pd
from sklearn.model_selection import (train_test_split,
StratifiedKFold)
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
- Load and preprocess the data:
RANDOM_STATE = 42
k_fold = StratifiedKFold(5, shuffle=True, random_state=42)
df = pd.read_csv('credit_card_fraud.csv')
X = df.copy()
y = X.pop('Class')
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
stratify=y)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
- Define a list of classifiers to consider:
clf_list = [('dec_tree', DecisionTreeClassifier()),
('log_reg', LogisticRegression()),
('knn', KNeighborsClassifier()),
('naive_bayes', GaussianNB())]
- Iterate over the selected models, fit them to the data, and calculate recall using the test set:
for model_tuple in clf_list:
model = model_tuple[1]
if 'random_state' in model.get_params().keys():
model.set_params(random_state=RANDOM_STATE)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
recall = metrics.recall_score(y_pred, y_test)
print(f"{model_tuple[0]}'s recall score: {recall:.4f}")
Running the code results in the following output:
- Define and fit the stacking classifier:
lr = LogisticRegression() stack_clf = StackingClassifier(clf_list, final_estimator=lr, cv=k_fold, n_jobs=-1) stack_clf.fit(X_train, y_train)
- Create predictions, and evaluate the stacked ensemble:
y_pred = stacking_clf.predict(X_test)
recall = metrics.recall_score(y_pred, y_test)
print(f"The stacked ensemble's recall score: {recall:.4f}")