How to do it...

Execute the following steps to run Bayesian hyperparameter optimization of a LightGBM model.

  1. Import the libraries:
import pandas as pd
from sklearn.model_selection import train_test_split
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from sklearn.model_selection import (cross_val_score,
StratifiedKFold)
from lightgbm import LGBMClassifier
from chapter_9_utils import performance_evaluation_report
  1. Define parameters for later use:
N_FOLDS = 5
MAX_EVALS = 200
  1. Load and prepare the data:
df = pd.read_csv('credit_card_fraud.csv')

X = df.copy()
y = X.pop('Class')

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
stratify=y)
  1. Define the objective function:
def objective(params, n_folds = N_FOLDS, random_state=42):

model = LGBMClassifier(**params)
model.set_params(random_state=random_state)

k_fold = StratifiedKFold(n_folds, shuffle=True,
random_state=random_state)

metrics = cross_val_score(model, X_train, y_train,
cv=k_fold, scoring='recall')
loss = -1 * metrics.mean()

return {'loss': loss, 'params': params, 'status': STATUS_OK}

  1. Define the search space:
lgbm_param_grid = {
'boosting_type': hp.choice('boosting_type', ['gbdt', 'dart',
'goss']),
'max_depth': hp.choice('max_depth', [-1, 2, 3, 4, 5, 6, 7, 8,
9, 10]),
'n_estimators': hp.choice('n_estimators', [10, 50, 100,
300, 750, 1000]),
'is_unbalance': hp.choice('is_unbalance', [True, False]),
'colsample_bytree': hp.uniform('colsample_bytree', 0.3, 1),
'learning_rate': hp.uniform ('learning_rate', 0.05, 0.3),
}
  1. Run the Bayesian optimization:
trials = Trials()
best_set = fmin(fn= objective,
space= lgbm_param_grid,
algo= tpe.suggest,
max_evals = MAX_EVALS,
trials= trials)

Inspecting the best_set prints the following summary:

{'boosting_type': 1,
'colsample_bytree': 0.8861225641638096,
'is_unbalance': 0,
'learning_rate': 0.193440600772047,
'max_depth': 6,
'n_estimators': 0}

The hyperparameters defined using hp.choice in the grid are presented as encoded integers. In the following steps, we show how to recover the original values.

  1. Define the dictionaries for mapping the results to hyperparameter values:
boosting_type = {0: 'gbdt', 1: 'dart', 2: 'goss'}
max_depth = {0: -1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6,
6: 7, 7: 8, 8: 9, 9: 10}
n_estimators = {0: 10, 1: 50, 2: 100, 3: 300, 4: 750, 5: 1000}
is_unbalance = {0: True, 1: False}

  1. Fit a model using the best hyperparameters:
best_lgbm = LGBMClassifier(
boosting_type = boosting_type[best_set['boosting_type']],
max_depth = max_depth[best_set['max_depth']],
n_estimators = n_estimators[best_set['n_estimators']],
is_unbalance = is_unbalance[best_set['is_unbalance']],
colsample_bytree = best_set['colsample_bytree'],
learning_rate = best_set['learning_rate']
)
best_lgbm.fit(X_train, y_train)
  1. Evaluate the performance of the best model on the test set:
_ = performance_evaluation_report(best_lgbm, X_test, y_test, 
show_plot=True,
show_pr_curve=True)

Running the code generates a plot the following plot:

The plot contains some of the performance evaluation metrics obtained from the custom performance_evaluation_report function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.180.133