Understanding image classification using the LeNet architecture

In this section, we'll implement a convolutional neural network for image classification. We are going to use the famous dataset of handwritten digits called the Modified National Institute of Standards and Technology (MNIST), which can be found at http://yann.lecun.com/exdb/mnist/. The dataset is a standard that was proposed by the US National Institute of Standards and Technology to calibrate and compare image recognition methods using machine learning, primarily based on neural networks.

The creators of the dataset used a set of samples from the US Census Bureau, with some samples written by students of American universities added later. All the samples are normalized, anti-aliased grayscale images of 28 x 28 pixels. The MNIST database contains 60,000 images for training and 10,000 images for testing. There are four files:

  • train-images-idx3-ubyte: Training set images
  • train-labels-idx1-ubyte: Training set labels
  • t10k-images-idx3-ubyte: Test set images
  • t10k-labels-idx1-ubyte: Test set labels

The files that contain labels are in the following format:

Offset

Type

Value

Description

0

 32-bit integer

0x00000801(2049)

Magic number (MSB first)

4

 32-bit integer

60,000 or 10,000

Number of items

8

Unsigned char

??

Label

9

Unsigned char

??

Label

...

...

...

...

 

The label's values are from 0 to 9. The files that contain images are in the following format:

Offset

Type

Value

Description

0

32-bit integer

0x00000803(2051)

Magic number (MSB first)

0

32-bit integer

60,000 or 10,000

Number of images

0

32-bit integer

28

Number of rows

0

32-bit integer

28

Number of columns

0

Unsigned byte

??

Pixel

0

Unsigned byte

??

Pixel

...

...

...

...

 

Pixels are stored in a row-wise manner, with values in the range of [0, 255]. 0 means background (white), while 255 means foreground (black).

In this example, we are using the PyTorch deep learning framework. This framework is primarily used with the Python language. However, its core part is written in C++, and it has a well-documented and actively developed C++ client API called LibPyTorch. This framework is based on the linear algebra library called ATen, which heavily uses the Nvidia CUDA technology for performance improvement. The Python and C++ APIs are pretty much the same but have different language notations, so we can use the official Python documentation to learn how to use the framework. This documentation also contains a section stating the differences between C++ and Python APIs and specific articles about the usage of the C++ API.

The PyTorch framework is widely used for research in deep learning. As we discussed previously, the framework provides functionality for managing big datasets. It can automatically parallelize loading the data from a disk, manage pre-loaded buffers for the data to reduce memory usage, and limit expensive performance disk operations. It provides the torch::data::Dataset base class for the implementation of the user custom dataset. We only need to override two methods here: get and size. These methods are not virtual because we have to use the C++ template's polymorphism to inherit from this class.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.133.54