How to do it...

Execute the following steps to run Bayesian hyperparameter optimization of a LightGBM model.

Import the libraries:

import pandas as pd
from sklearn.model_selection import train_test_split
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from sklearn.model_selection import (cross_val_score, 
                                    StratifiedKFold)
from lightgbm import LGBMClassifier
from chapter_9_utils import performance_evaluation_report

Define parameters for later use:

N_FOLDS = 5
MAX_EVALS = 200

Load and prepare the data:

df = pd.read_csv('credit_card_fraud.csv')

X = df.copy()
y = X.pop('Class')

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    stratify=y)

Define the objective function:

def objective(params, n_folds = N_FOLDS, random_state=42):
    
    model = LGBMClassifier(**params)
    model.set_params(random_state=random_state)
    
    k_fold = StratifiedKFold(n_folds, shuffle=True, 
                             random_state=random_state)
    
    metrics = cross_val_score(model, X_train, y_train, 
                              cv=k_fold, scoring='recall')
    loss = -1 * metrics.mean()
    
    return {'loss': loss, 'params': params, 'status': STATUS_OK}

Define the search space:

lgbm_param_grid = {
    'boosting_type': hp.choice('boosting_type', ['gbdt', 'dart', 
                                                 'goss']),
    'max_depth': hp.choice('max_depth', [-1, 2, 3, 4, 5, 6, 7, 8, 
                                          9, 10]),
    'n_estimators': hp.choice('n_estimators', [10, 50, 100, 
                                               300, 750, 1000]),
    'is_unbalance': hp.choice('is_unbalance', [True, False]),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.3, 1),
    'learning_rate': hp.uniform ('learning_rate', 0.05, 0.3),
}

Run the Bayesian optimization:

trials = Trials()
best_set = fmin(fn= objective,
                space= lgbm_param_grid,
                algo= tpe.suggest,
                max_evals = MAX_EVALS,
                trials= trials)

Inspecting the best_set prints the following summary:

{'boosting_type': 1,
 'colsample_bytree': 0.8861225641638096,
 'is_unbalance': 0,
 'learning_rate': 0.193440600772047,
 'max_depth': 6,
 'n_estimators': 0}

The hyperparameters defined using hp.choice in the grid are presented as encoded integers. In the following steps, we show how to recover the original values.

Define the dictionaries for mapping the results to hyperparameter values:

boosting_type = {0: 'gbdt', 1: 'dart', 2: 'goss'}
max_depth = {0: -1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 
             6: 7, 7: 8, 8: 9, 9: 10}
n_estimators = {0: 10, 1: 50, 2: 100, 3: 300, 4: 750, 5: 1000}
is_unbalance = {0: True, 1: False}

Fit a model using the best hyperparameters:

best_lgbm = LGBMClassifier(
    boosting_type = boosting_type[best_set['boosting_type']], 
    max_depth = max_depth[best_set['max_depth']], 
    n_estimators = n_estimators[best_set['n_estimators']], 
    is_unbalance = is_unbalance[best_set['is_unbalance']],
    colsample_bytree = best_set['colsample_bytree'], 
    learning_rate = best_set['learning_rate']
)
best_lgbm.fit(X_train, y_train)

Evaluate the performance of the best model on the test set:

_ = performance_evaluation_report(best_lgbm, X_test, y_test, 
                                  show_plot=True, 
                                  show_pr_curve=True)

Running the code generates a plot the following plot:

The plot contains some of the performance evaluation metrics obtained from the custom performance_evaluation_report function.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...