Understanding image classification using the LeNet architecture

In this section, we'll implement a convolutional neural network for image classification. We are going to use the famous dataset of handwritten digits called the Modified National Institute of Standards and Technology (MNIST), which can be found at http://yann.lecun.com/exdb/mnist/. The dataset is a standard that was proposed by the US National Institute of Standards and Technology to calibrate and compare image recognition methods using machine learning, primarily based on neural networks.

The creators of the dataset used a set of samples from the US Census Bureau, with some samples written by students of American universities added later. All the samples are normalized, anti-aliased grayscale images of 28 x 28 pixels. The MNIST database contains 60,000 images for training and 10,000 images for testing. There are four files:

train-images-idx3-ubyte: Training set images
train-labels-idx1-ubyte: Training set labels
t10k-images-idx3-ubyte: Test set images
t10k-labels-idx1-ubyte: Test set labels

The files that contain labels are in the following format:

Offset	Type	Value	Description
0	32-bit integer	0x00000801(2049)	Magic number (MSB first)
4	32-bit integer	60,000 or 10,000	Number of items
8	Unsigned char	??	Label
9	Unsigned char	??	Label
...	...	...	...

The label's values are from 0 to 9. The files that contain images are in the following format:

Offset	Type	Value	Description
0	32-bit integer	0x00000803(2051)	Magic number (MSB first)
0	32-bit integer	60,000 or 10,000	Number of images
0	32-bit integer	28	Number of rows
0	32-bit integer	28	Number of columns
0	Unsigned byte	??	Pixel
0	Unsigned byte	??	Pixel
...	...	...	...

Pixels are stored in a row-wise manner, with values in the range of [0, 255]. 0 means background (white), while 255 means foreground (black).

In this example, we are using the PyTorch deep learning framework. This framework is primarily used with the Python language. However, its core part is written in C++, and it has a well-documented and actively developed C++ client API called LibPyTorch. This framework is based on the linear algebra library called ATen, which heavily uses the Nvidia CUDA technology for performance improvement. The Python and C++ APIs are pretty much the same but have different language notations, so we can use the official Python documentation to learn how to use the framework. This documentation also contains a section stating the differences between C++ and Python APIs and specific articles about the usage of the C++ API.

The PyTorch framework is widely used for research in deep learning. As we discussed previously, the framework provides functionality for managing big datasets. It can automatically parallelize loading the data from a disk, manage pre-loaded buffers for the data to reduce memory usage, and limit expensive performance disk operations. It provides the torch::data::Dataset base class for the implementation of the user custom dataset. We only need to override two methods here: get and size. These methods are not virtual because we have to use the C++ template's polymorphism to inherit from this class.

Table of Contents for Understanding image classification using the LeNet architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding image classification using the LeNet architecture