Training

Putting the model together, and incorporating our new cool multi-GPU feature, we come up with the following architecture:


def build_network(num_gpu=1, input_shape=None):
    inputs = Input(shape=input_shape, name="input")

    # convolutional block 1
    conv1 = Conv2D(64, kernel_size=(3,3), activation="relu", 
      name="conv_1")(inputs)
    batch1 = BatchNormalization(name="batch_norm_1")(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2), name="pool_1")(batch1)

    # convolutional block 2
    conv2 = Conv2D(32, kernel_size=(3,3), activation="relu", 
      name="conv_2")(pool1)
    batch2 = BatchNormalization(name="batch_norm_2")(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2), name="pool_2")(batch2)

    # fully connected layers
    flatten = Flatten()(pool2)
    fc1 = Dense(512, activation="relu", name="fc1")(flatten)
    d1 = Dropout(rate=0.2, name="dropout1")(fc1)
    fc2 = Dense(256, activation="relu", name="fc2")(d1)
    d2 = Dropout(rate=0.2, name="dropout2")(fc2)

    # output layer
    output = Dense(10, activation="softmax", name="softmax")(d2)

    # finalize and compile
    model = Model(inputs=inputs, outputs=output)
    if num_gpu > 1:
        model = multi_gpu_model(model, num_gpu)
    model.compile(optimizer='adam', loss='categorical_crossentropy', 
      metrics=["accuracy"])
    return model

We can use this to build our model:

model = build_network(num_gpu=1, input_shape=(IMG_HEIGHT, IMG_WIDTH, CHANNELS))

And then we can fit it, as you'd expect:

model.fit(x=data["train_X"], y=data["train_y"],
          batch_size=32,
          epochs=200,
          validation_data=(data["val_X"], data["val_y"]),
          verbose=1,
          callbacks=callbacks)

As we train this model, you will notice that overfitting is an immediate concern. Even with a relatively modest two convolutional layers, we're already overfitting a bit.

You can see the effects of overfitting from the following graphs:

It's no surprise, 50,000 observations is not a lot of data, especially for a computer vision problem. In practice, computer vision problems benefit from very large datasets. In fact, Chen Sun showed that additional data tends to help computer vision models linearly with the log of the data volume in https://arxiv.org/abs/1707.02968. Unfortunately, we can't really go find more data in this case. But maybe we can make some. Let's talk about data augmentation next.

Table of Contents for Training

Create new playlist

Sign In

Sign Up

Table of Contents for
Training