Using random search with scikit-learn

Grid search and random search can be easily implemented with scikit-learn. In this example, we will use the KerasClassifier class from Keras to wrap our model and make it compatible with the scikit-learn API. Then, we will use scikit-learn's RandomSearchCV class to do the hyperparameter search.

To do this, we will start by changing our now familiar model build function slightly. We will parameterize it with the hyperparameters we would like to search, as shown in the following code:

def build_network(keep_prob=0.5, optimizer='adam'):
    inputs = Input(shape=(784,), name="input")
    x = Dense(512, activation='relu', name="hidden1")(inputs)
    x = Dropout(keep_prob)(x)
    x = Dense(256, activation='relu', name="hidden2")(x)
    x = Dropout(keep_prob)(x)
    x = Dense(128, activation='relu', name="hidden3")(x)
    x = Dropout(keep_prob)(x)
    prediction = Dense(10, activation='softmax', name="output")(x)
    model = Model(inputs=inputs, outputs=prediction)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', 
                  metrics=["accuracy"])
    return model

In this example, I would like to search for an ideal value for dropout, and I would like to try several different optimizers. In order to make this happen, I need to include these as parameters in the function so that they can be changed by our random search method. We could, of course, parameterize and test many other network architecture choices using this same methodology, but we're keeping it simple here.

Next, we will create a function that returns a dictionary of all the possible hyperparameters and their value spaces that we'd like to search through, as shown in the following code:

def create_hyperparameters():
    batches = [10, 20, 30, 40, 50]
    optimizers = ['rmsprop', 'adam', 'adadelta']
    dropout = np.linspace(0.1, 0.5, 5)
    return {"batch_size": batches, "optimizer": optimizers, 
      "keep_prob": dropout}

All that's left is to connect these two pieces together using RandomSearchCV. First, we will wrap our model into keras.wrappers.scikit_learn.KerasClassifier so that it's compatible with scikit-learn, as shown in the following code:

model = KerasClassifier(build_fn=build_network, verbose=0)

Next, we will get our hyperparameter dictionary, using the following code:

hyperparameters = create_hyperparameters()

Then, finally, we will create a RandomSearchCV object that we will use to search through the parameter space of the model, as shown in the following code:

search = RandomizedSearchCV(estimator=model, param_distributions=hyperparameters, n_iter=10, n_jobs=1, cv=3, verbose=1)

Once we fit this RandomizedSearchCV object, it will randomly choose values from the parameter distributions and apply them to the model. It will do this 10 times (n_iter=10), and it will try each combination three times because we used 3-fold cross-validation. This means we will be fitting the model a total of 30 times. Using the average accuracy across runs, it will return the best model as a class attribute .best_estimator and it will return the best parameters as .best_params_.

To fit it, we just call its fit method, as if it were a model, as shown in the following code:

search.fit(data["train_X"], data["train_y"])

print(search.best_params_)

Fitting the MNIST model used in Chapter 5, Using Keras for Multiclass Classification, on the above grid takes about 9 minutes on a Tesla K80 GPU instance. Before we call this section done, let's take a look at some of the output for the search, as illustrated in the following code:

Using TensorFlow backend.
 Fitting 3 folds for each of 10 candidates, totalling 30 fits
tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
 name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
 pciBusID: 0000:00:1e.0
 totalMemory: 11.17GiB freeMemory: 11.10GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
 [Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 8.8min finished
 {'keep_prob': 0.20000000000000001, 'batch_size': 40, 'optimizer': 'adam'}

As you can see in this output, across 10 runs it appears that the bolded hyperparameters were the best performing set. Of course we could certainly run for more iterations, and we might find a better option. Our budget is only decided by time, patience, and the credit card attached to our cloud account.

Table of Contents for Using random search with scikit-learn

Create new playlist

Sign In

Sign Up

Table of Contents for
Using random search with scikit-learn