Getting acquainted with Keras

The core data structure of Keras is a model, which is similar to OpenCV's classifier object, except it focuses on neural networks only. The simplest type of model is the sequential model, which arranges the different layers of the neural network in a linear stack, just like we did for the MLP in OpenCV:

In [1]: from keras.models import Sequential
... model = Sequential()
Out[1]: Using TensorFlow backend.

Then, different layers can be added to the model one by one. In Keras, layers do not just contain neurons, they also perform a function. Some core layer types include the following:

  • Dense: This is a densely connected layer. This is exactly what we used when we designed our MLP: a layer of neurons that is connected to every neuron in the previous layer.
  • Activation: This applies an activation function to an output. Keras provides a whole range of activation functions, including OpenCV's identify function (linear), the hyperbolic tangent (tanh), a sigmoidal squashing function (sigmoid), a softmax function (softmax), and many more.
  • Reshape: This reshapes an output to a certain shape.

There are other layers that calculate arithmetic or geometric operations on their inputs:

  • Convolutional layers: These layers allow you to specify a kernel with which the input layer is convolved. This allows you to perform operations such as a Sobel filter or apply a Gaussian kernel in 1D, 2D, or even 3D.
  • Pooling layers: These layers perform a max pooling operation on their input, where the output neuron's activity is given by the maximally active input neuron.

Some other layers that are popular in deep learning are as follows:

  • Dropout: This layer randomly sets a fraction of input units to zero at each update. This is a way to inject noise into the training process, making it more robust.
  • Embedding: This layer encodes categorical data, similar to some functions from scikit-learn's preprocessing module.
  • Gaussian noise: This layer applies additive zero-centered Gaussian noise. This is another way of injecting noise into the training process, making it more robust.

A perceptron similar to the preceding one could thus be implemented using a dense layer that has two inputs and one output. Staying true to our earlier example, we will initialize the weights to zero and use the hyperbolic tangent as an activation function:

In [2]: from keras.layers import Dense
... model.add(Dense(1, activation='tanh', input_dim=2,
... kernel_initializer='zeros'))

Finally, we want to specify the training method. Keras provides a number of optimizers, including the following:

  • Stochastic gradient descent (SGD): This is what we discussed earlier.
  • Root mean square propagation (RMSprop): This is a method in which the learning rate is adapted for each of the parameters.
  • Adaptive moment estimation (Adam): This is an update to the root mean square propagation.

In addition, Keras also provides a number of different loss functions:

  • Mean squared error (mean_squared_error): This is what was discussed earlier.
  • Hinge loss (hinge): This is a maximum-margin classifier often used with SVM, as discussed in Chapter 6, Detecting Pedestrians with Support Vector Machines.

You can see that there's a plethora of parameters to be specified and methods to choose from. To stay true to our aforementioned perceptron implementation, we will choose SGD as an optimizer, the mean squared error as a cost function, and accuracy as a scoring function:

In [3]: model.compile(optimizer='sgd',
... loss='mean_squared_error',
... metrics=['accuracy'])

In order to compare the performance of the Keras implementation to our home-brewed version, we will apply the classifier to the same dataset:

In [4]: from sklearn.datasets.samples_generator import make_blobs
... X, y = make_blobs(n_samples=100, centers=2,
... cluster_std=2.2, random_state=42)

Finally, a Keras model is fit to the data with a very familiar syntax. Here, we can also choose how many iterations to train for (epochs), how many samples to present before we calculate the error gradient (batch_size), whether to shuffle the dataset (shuffle), and whether to output progress updates (verbose):

In [5]: model.fit(X, y, epochs=400, batch_size=100, shuffle=False,
... verbose=0)

After the training completes, we can evaluate the classifier as follows:

In [6]: model.evaluate(X, y)
Out[6]: 32/100 [========>.....................] - ETA: 0s
[0.040941802412271501, 1.0]

Here, the first reported value is the mean squared error, whereas the second value denotes accuracy. This means that the final mean squared error was 0.04, and we had 100% accuracy. Way better than our own implementation!

You can find more information on Keras, source code documentation, and a number of tutorials at http://keras.io.

With these tools in hand, we are now ready to approach a real-world dataset!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.62.168