Using pipelines in grid searches

Using a pipeline in a grid search works the same way as using any other estimator.

We define a parameter grid to search over and construct GridSearchCV from the pipeline and the parameter grid. When specifying the parameter grid, there is, however, a slight change. We need to specify for each parameter which step of the pipeline it belongs to. Both parameters that we want to adjust, C and gamma, are parameters of SVC. In the preceding section, we gave this step the name "svm". The syntax to define a parameter grid for a pipeline is to specify for each parameter the step name, followed by __ (a double underscore), followed by the parameter name.

Hence, we would construct the parameter grid as follows:

In [8]: param_grid = {'svm__C': [0.001, 0.01, 0.1, 1, 10, 100],
...                   'svm__gamma': [0.001, 0.01, 0.1, 1, 10, 100]}

With this parameter grid, we can use GridSearchCV as usual:

In [9]: grid = GridSearchCV(pipe, param_grid=param_grid, cv=10)
...     grid.fit(X_train, y_train);

The best score in the grid is stored in best_score_:

In [10]: grid.best_score_
Out[10]: 0.97652582159624413

Similarly, the best parameters are stored in best_params_:

In [11]: grid.best_params_
Out[11]: {'svm__C': 1, 'svm__gamma': 1}

But recall that the cross-validation score might be overly optimistic. To know the true performance of the classifier, we need to score it on the test set:

In [12]: grid.score(X_test, y_test)
Out[12]: 0.965034965034965

In contrast to the grid search we did before, now, for each split in the cross-validation, MinMaxScaler is refit with only the training splits, and no information is leaked from the test split into the parameter search.

This makes it easy to build a pipeline to chain together a whole variety of steps! You can mix and match estimators in the pipeline at will, you just need to make sure that every step in the pipeline provides a transform method (except for the last step). This allows an estimator in the pipeline to produce a new representation of the data, which, in turn, can be used as input to the next step.

The Pipeline class is not restricted to preprocessing and classification but can, in fact, join any number of estimators together. For example, we could build a pipeline containing feature extraction, feature selection, scaling, and classification, for a total of four steps. Similarly, the last step could be regression or clustering instead of classification.

Table of Contents for Using pipelines in grid searches

Create new playlist

Sign In

Sign Up

Table of Contents for
Using pipelines in grid searches