Loading images into Caffe2 tensors

Let's learn how to load an image into the Caffe2 tensor object and modify it according to the model's input requirements. The model expects the input images to be normalized and three-channel RGB images whose shapes are (N x 3 x H x W), where N is the batch size and H and W are expected to be at least 224 pixels wide. Normalization assumes that the images are loaded into a range of [0, 1] and then normalized using means equal to [0.485, 0.456, 0.406] and standard deviations equal to [0.229, 0.224, 0.225].

Let's assume that we have the following function definition for image loading:

caffe2::TensorCPU ReadImageTensor(const std::string& file_name,
                                                 int width,
                                                 int height) {
   ...
}

Let's write its implementation. For image loading, we will use the OpenCV library:

// load image
auto image = cv::imread(file_name, cv::IMREAD_COLOR);
 
if (!image.cols || !image.rows) {
   return {};
}
 
if (image.cols != width || image.rows != height) {
   // scale image to fit
   cv::Size scaled(std::max(height * image.cols / image.rows, width),
   std::max(height, width * image.rows / image.cols));
   cv::resize(image, image, scaled);
 
  // crop image to fit
   cv::Rect crop((image.cols - width) / 2, (image.rows - height) / 2, width,
   height);
   image = image(crop);
}

Here, we read the image from a file with the cv::imread function. If the image dimensions are not equal to specified ones, we need to resize the image with the cv::resize function and crop the image if the image dimensions exceed the specified ones.

Then, we convert the image into the floating-point type and RGB format:

image.convertTo(image, CV_32FC3);
 cv::cvtColor(image, image, cv::COLOR_BGR2RGB);

After formatting is complete, we can split the image into three separate channels with red, green, and blue colors. We should also normalize the color values. The following code shows how to do this:

 std::vector<cv::Mat> channels(3);
 cv::split(image, channels);
 
 std::vector<double> mean = {0.485, 0.456, 0.406};
 std::vector<double> stddev = {0.229, 0.224, 0.225};
 
 size_t i = 0;
 for (auto& c : channels) {
    c = ((c / 255) - mean[i]) / stddev[i];
    ++i;
 }

Each channel was subtracted by the corresponding mean and divided by the corresponding standard deviation for the normalization process.

Then, we should concatenate the channels:

 cv::vconcat(channels[0], channels[1], image);
 cv::vconcat(image, channels[2], image);
 assert(image.isContinuous());

The normalized channels were concatenated into one contiguous image with the cv::vconcat function.

The following code shows how to initialize the Caffe2 tensor with the image data:

 std::vector<int64_t> dims = {1, 3, height, width};
 
 caffe2::TensorCPU tensor(dims, caffe2::DeviceType::CPU);
 std::copy_n(reinterpret_cast<float*>(image.data),
             image.size().area(),
             tensor.mutable_data<float>());
 
 return tensor;

Here, the image data was copied into the caffe2::TensorCPU object, which was initialized with the specified dimensions. The computational device was equal to caffe2::DeviceType::CPU. This tensor object was created with the floating-point underlying type by default, so we used the mutable_data<float>() member function to access the internal storage of the tensor. The OpenCV image data was accessed with the cv::Mat::data type member. We cast the image data into the floating-point type because this member variable is of the unsigned char * type. The pixel's data was copied with the standard std::copy_n function. Finally, in the last snippet of code, we returned the tensor object.

Another important function that was used in the ONNX format example was a function that can read class definitions from a synset file. We will take a look at this in the next section.

Table of Contents for Loading images into Caffe2 tensors

Create new playlist

Sign In

Sign Up

Table of Contents for
Loading images into Caffe2 tensors