Dlib

The Dlib library has an API for working with neural networks. It can also be built with Nvidia CUDA support for performance optimization. Using the CUDA or the OpenCL technologies for GPUs is important if we are planning to work with a large amount of data and deep neural networks.

The approach used in the Dlib library for neural networks is the same as for other machine learning algorithms in this library. We should instantiate and configure an object of the required algorithm class and then use a particular trainer to train it on a dataset.

There is the dnn_trainer class for training neural networks in the Dlib library. Objects of this class should be initialized with an object of the concrete network and the object of the optimization algorithm. The most popular optimization algorithm is the stochastic gradient descent algorithm with momentum, which we discussed in the Backpropagation method modes section. This algorithm is implemented in the sgd class. Objects of the sgd class should be configured with the weight decay regularization and momentum parameter values. The dnn_trainer class has the following essential configuration methods: set_learning_rate, set_mini_batch_size, and set_max_num_epochs. These set the learning rate parameter value, the mini-batch size, and the maximum number of training epochs, respectively. Also, this trainer class supports dynamic learning rate change so that we can, for example, make a lower learning rate for later epochs. The learning rate shrink parameter can be configured with the set_learning_rate_shrink_factor method. But for the following example, we'll use the constant learning rate because, for this particular data, it gives better training results.

The next essential item for instantiating the trainer object is the neural network type object. The Dlib library uses a declarative style to define the network architecture, and for this purpose, it uses C++ templates. So, to define the neural network architecture, we should start with the network's input. In our case, this is of the matrix<double> type. We need to pass this as the template argument to the next layer type; in our case, this is the fully-connected layer of the fc type. The fully-connected layer type also takes the number of neurons as the template argument. To define the whole network, we should create the nested type definitions, until we reach the last layer and the loss function. In our case, this is the loss_mean_squared type, which implements the mean squared loss function, which is usually used for regression tasks.

The following code snippet shows the network definition with the Dlib library API:

using NetworkType = loss_mean_squared<fc<1, 
                                 htan<fc<8, 
                                 htan<fc<16, 
                                 htan<fc<32, 
                                 input<matrix<double>>>>>>>>>>;

This definition can be read in the following order:

We started with the input layer:

input<matrix<double>

Then, we added the first hidden layer with 32 neurons:

fc<32, input<matrix<double>>

After, we added the hyperbolic tangent activation function to the first hidden layer:

htan<fc<32, input<matrix<double>>>

Next, we added the second hidden layer with 16 neurons and an activation function:

htan<fc<16, htan<fc<32, input<matrix<double>>>>>>

Then, we added the third hidden layer with 8 neurons and an activation function:

htan<fc<8, htan<fc<16, htan<fc<32, input<matrix<double>>>>>>>>

Then, we added the last output layer with 1 neuron and without an activation function:

fc<1, htan<fc<8, htan<fc<16, htan<fc<32, input<matrix<double>>>>>>>>>

Finally, we finished with the loss function:

loss_mean_squared<...>

The following snippet shows the complete source code example with a network definition:

size_t n = 10000;
...
std::vector<matrix<double>> x(n);
std::vector<float> y(n);
...
using NetworkType = loss_mean_squared<
fc<1, htan<fc<8, htan<fc<16, htan<fc<32, input<matrix<double>>>>>>>>>>;
NetworkType network;
float weight_decay = 0.0001f;
float momentum = 0.5f;
sgd solver(weight_decay, momentum);
dnn_trainer<NetworkType> trainer(network, solver);
trainer.set_learning_rate(0.01);
trainer.set_learning_rate_shrink_factor(1);  // disable learning rate changes
trainer.set_mini_batch_size(64);
trainer.set_max_num_epochs(500);
trainer.be_verbose();
trainer.train(x, y);
network.clean();

auto predictions = network(new_x);

Now that we've configured the trainer object, we can use the train method to start the actual training process. This method takes two C++ vectors as input parameters. The first one should contain training objects of the matrix<double> type and the second one should contain the target regression values that are float types. We can also call the be_verbose method to see the output log of the training process. After the network has been trained, we call the clean method to allow the network object to clear the memory from the intermediate training values and therefore reduce memory usage.

Table of Contents for Dlib

Create new playlist

Sign In

Sign Up

Table of Contents for
Dlib