Implementing forward propagation

In forward propagation, we can break the forward process into two steps, as follows:

Since the weight size does not have to be affected by the batch size, we only consider the number of input weights and output weights. On the other hand, data feeding blobs, such as input and output, are affected by the batch size. So, our GEMM operation with the filter and input data can be designed as follows:

The hidden output will be added with the bias values. The input data is not limited to the data from the data loader. As we stack the layers, the output of the previous layer will be the current layer's input data. The forward operation can be implemented as follows:

Blob<float> *Dense::forward(Blob<float> *input) {
  .. { blob initialization } ..

  // output = weights^T * input (without biases)
  cublasSgemm(cuda_->cublas(),
        CUBLAS_OP_T, CUBLAS_OP_N, output_size_, 
        batch_size_, input_size_,
        &cuda_->one, weights_->cuda(), input_size_,
        input_->cuda(), input_size_,
        &cuda_->zero, output_->cuda(), output_size_);

  // output += biases * one_vec^T
  cublasSgemm(cuda_->cublas(), 
        CUBLAS_OP_N, CUBLAS_OP_N, output_size_, batch_size_, 1,
        &cuda_->one, biases_->cuda(), output_size_, one_vec, 1, 
        &cuda_->one, output_->cuda(), output_size_);
  return output_;
}

At the first iteration, each layer needs to initialize its weight and bias. For example, this Dense layer can initialize its weights, biases, and output tensor elements. We can separate this initialization task into two phases. The first is for the weights and biases, as follows:

// initialize weights and biases
if (weights_ == nullptr)
{
    // setup parameter size information
    input_size_ = input->c() * input->h() * input->w();
    
    // initialize weight, bias, and output
    weights_ = new Blob<float>(1, 1, input_size_, output_size_);
    biases_ = new Blob<float>(1, 1, output_size_);
}

The next phases is about updating the input information and initializing the output blob. When it's new or needs to be reconfigured, we need to do the following. In this task, we also need to create a vector filled with our batch size. This will be used in biases addition:

// initilaize input and output
if (input_ == nullptr || batch_size_ != input->n())
{
  input_ = input;
  batch_size_ = input->n();

  if (output_ == nullptr)
    output_ = new Blob<float>(batch_size_, output_size_);
  else
    output_->reset(batch_size_, output_size_);
    
  output_->tensor();

  if (d_one_vec != nullptr)
    cudaFree(d_one_vec);
  checkCudaErrors(cudaMalloc((void**)&d_one_vec, sizeof(float) * batch_size_));
  init_one_vec<<< (batch_size_+BLOCK_DIM_1D-1)/BLOCK_DIM_1D, BLOCK_DIM_1D >>>(d_one_vec, batch_size_);

  if (!freeze_)
    init_weight_bias();
}

This initialization task triggered not only the first iteration but also batch size changes. Checking the batch size is not required in the training phase, but it will be useful in the testing phase. This is because the batch sizes in training and inference are different. In this case, we need to create an output blob following the new batch size. The output tensor's size is determined as the channel size. The output blob's creation code, as follows, creates a blob of size (batch_size_, output_size_, 1, 1):

output_ = new Blob<float>(batch_size_, output_size_);

This creates flattened tensors. Then, we feed these tensors, which requires them to be aligned in channels. This alignment is specifically required in the softmax layer. We will cover this in the softmax layer's implementation.

Another important task in this phase is to initialize weights and biases. In our implementation, we will use the ReLU as an activator. We will use the normal initializer (https://arxiv.org/abs/1502.01852) technique to make the network trainable. Following the guidelines in the preceding paper, the required weight values can be generated with the following equation:

is the number of inputs from the previous layer. For this reason, we can initialize the parameters after we update the input tensor information. Also, the bias values will be initialized as 0. The following code shows the implementation of this:

void Layer::init_weight_bias(unsigned int seed)
{
    // Create random network
    std::random_device rd;
    std::mt19937 gen(seed == 0 ? rd() : static_cast<unsigned int>
                                        (seed));

    // He normal distribution
    float range = sqrt(6.f / input_->size());
    std::uniform_real_distribution<> dis(-range, range);

    for (int i = 0; i < weights_->len(); i++)
        weights_->ptr()[i] = static_cast<float>(dis(gen));
    for (int i = 0; i < biases_->len(); i++)
        biases_->ptr()[i] = 0.f;

    // copy initialized value to the device
    weights_->to(DeviceType::cuda);
    biases_->to(DeviceType::cuda);
}

Now, let's cover backward propagation.

Table of Contents for Implementing forward propagation

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing forward propagation