Neural network initialization

Let's start by generating the training data. The following code shows how we can do this:

 torch::DeviceType device = torch::cuda::is_available()
                                 ? torch::DeviceType::CUDA
                                 : torch::DeviceType::CPU;
 
 std::random_device rd;
 std::mt19937 re(rd());
 std::uniform_real_distribution<float> dist(-0.1f, 0.1f);
 
 // generate data
 size_t n = 1000;
 torch::Tensor x;
 torch::Tensor y;
 {
     std::vector<float> values(n);
     std::iota(values.begin(), values.end(), 0);
     std::shuffle(values.begin(), values.end(), re);
     
     std::vector<torch::Tensor> x_vec(n);
     std::vector<torch::Tensor> y_vec(n);
     for (size_t i = 0; i < n; ++i) {
         x_vec[i] = torch::tensor(
         values[i],
         torch::dtype(torch::kFloat).device(device).requires_grad(false));
         
         y_vec[i] = torch::tensor(
         (func(values[i]) + dist(re)),
         torch::dtype(torch::kFloat).device(device).requires_grad(false));
     }
     x = torch::stack(x_vec);
     y = torch::stack(y_vec);
 }
 
 // normalize data
 auto x_mean = torch::mean(x, /*dim*/ 0);
 auto x_std = torch::std(x, /*dim*/ 0);
 x = (x - x_mean) / x_std;

Usually, we want to utilize as many hardware resources as possible. So, first, we checked whether a GPU with CUDA technology was available in the system with the torch::cuda::is_available() call. Then, we generated 1,000 predictor variable values and shuffled them. For each value, we calculated the target value with the linear function we used in the previous examples. All the values were moved into the torch::Tensor objects with torch::tensor function calls. Notice that we used a previously detected device for tensor creation. After we moved all the values to tensors, we used the torch::stack function to concatenate the predictor and target values in two distinct single tensors. This was required to perform data normalization with the PyTorch linear algebra routines. Then, we used the torch::mean and torch::std functions to calculate the mean and standard deviation of predictor values and normalized them.

In the following code, we're defining the NetImpl class, which implements our neural network:

 class NetImpl : public torch::nn::Module {
   public:
     NetImpl() {
         l1_ = torch::nn::Linear(torch::nn::LinearOptions(1, 
                                 8).with_bias(true));
         register_module("l1", l1_);
         l2_ = torch::nn::Linear(torch::nn::LinearOptions(8, 
                                 4).with_bias(true));
         register_module("l2", l2_);
         l3_ = torch::nn::Linear(torch::nn::LinearOptions(4, 
                                 1).with_bias(true));
         register_module("l3", l3_);
         
         // initialize weights
         for (auto m : modules(false)) {
             if (m->name().find("Linear") != std::string::npos) {
                 for (auto& p : m->named_parameters()) {
                     if (p.key().find("weight") != std::string::npos) {
                         torch::nn::init::normal_(p.value(), 0, 0.01);
                     }
                     if (p.key().find("bias") != std::string::npos) {
                         torch::nn::init::zeros_(p.value());
                     }
                 }
             }
         }
     }
     
     torch::Tensor forward(torch::Tensor x) {
         auto y = l1_(x);
         y = l2_(y);
         y = l3_(y);
         return y;
     }
   private:
     torch::nn::Linear l1_{nullptr};
     torch::nn::Linear l2_{nullptr};
     torch::nn::Linear l3_{nullptr};
 }
 TORCH_MODULE(Net);

Here, we defined our neural network model as a network with three fully connected neuron layers with a linear activation function. Each layer is of the torch::nn::Linear type. In the constructor of our model, we initialized all the network parameters with small random values. We did this by iterating over all the network modules (see the modules method call) and applying the torch::nn::init::normal_ function to the parameters that were returned by the named_parameters() module's method. Biases were initialized to zeros with the torch::nn::init::zeros_ function. The named_parameters() method returned objects consisting of a string name and a tensor value, so for initialization, we used its value method.

Now, we can train the model with our generated training data. The following code shows how to train our model:

 Net model;
 model->to(device);
 
 // initialize optimizer ----------------------------------------------
 double learning_rate = 0.01;
 torch::optim::Adam optimizer(
          model->parameters(),
          torch::optim::AdamOptions(learning_rate).weight_decay(0.00001));
 
 // training
 int64_t batch_size = 10;
 int64_t batches_num = static_cast<int64_t>(n) / batch_size;
 int epochs = 10;
 for (int epoch = 0; epoch < epochs; ++epoch) {
     // train the model -----------------------------------------------
     model->train();  // switch to the training mode
     
     // Iterate the data
     double epoch_loss = 0;
     for (int64_t batch_index = 0; batch_index < batches_num; ++batch_index) {
         auto batch_x = x.narrow(0, batch_index * batch_size, batch_size);
         auto batch_y = y.narrow(0, batch_index * batch_size, batch_size);
         
         // Clear gradients
         optimizer.zero_grad();
         
         // Execute the model on the input data
         torch::Tensor prediction = model->forward(batch_x);
         
         torch::Tensor loss = torch::mse_loss(prediction, batch_y);
         
         // Compute gradients of the loss and parameters of our model
         loss.backward();
         
         // Update the parameters based on the calculated gradients.
         optimizer.step();
     }
 }

To utilize all our hardware resources, we moved the model to the selected computational device. Then, we initialized an optimizer. In our case, the optimizer used the Adam algorithm. Afterwards, we ran a standard training loop over the epochs, where for each epoch, we took the training batch, cleared the optimizer's gradients, performed a forward pass, computed the loss, performed a backward pass, and updated the model weights with the optimizer step.

To select a batch of training data from the dataset, we used the tensor's narrow method, which returned a new tensor with a reduced dimension. This function takes a new number of dimensions as the first parameter, the start position as the second parameter, and the number of elements to remain as the third parameter.

As we mentioned previously there are two approaches we can use to serialize model parameters in PyTorch in the C++ API (the Python API provides even more reach). Let's look at them.

Table of Contents for Neural network initialization

Create new playlist

Sign In

Sign Up

Table of Contents for
Neural network initialization