How to tune parameters with GridSearchCV

The GridSearchCV class in the model_selection module facilitates the systematic evaluation of all combinations of the hyperparameter values that we would like to test. In the following code, we will illustrate this functionality for seven tuning parameters that when defined will result in a total of 24 x 32 x 4 = 576 different model configurations:

cv = OneStepTimeSeriesSplit(n_splits=12)

param_grid = dict(
n_estimators=[100, 300],
learning_rate=[.01, .1, .2],
max_depth=list(range(3, 13, 3)),
subsample=[.8, 1],
min_samples_split=[10, 50],
min_impurity_decrease=[0, .01],
max_features=['sqrt', .8, 1]
)

The .fit() method executes the cross-validation using the custom OneStepTimeSeriesSplit and the roc_auc score to evaluate the 12-folds. Sklearn lets us persist the result as it would for any other model using the joblib pickle implementation, as shown in the following code:

gs = GridSearchCV(gb_clf,
param_grid,
cv=cv,
scoring='roc_auc',
verbose=3,
n_jobs=-1,
return_train_score=True)
gs.fit(X=X, y=y)

# persist result using joblib for more efficient storage of large numpy arrays
joblib.dump(gs, 'gbm_gridsearch.joblib')

The GridSearchCV object has several additional attributes after completion that we can access after loading the pickled result to learn which hyperparameter combination performed best and its average cross-validation AUC score, which results in a modest improvement over the default values. This is shown in the following code:

pd.Series(gridsearch_result.best_params_)
learning_rate 0.01
max_depth 9.00
max_features 1.00
min_impurity_decrease 0.01
min_samples_split 10.00
n_estimators 300.00
subsample 0.80

gridsearch_result.best_score_
0.6853
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.42.94