State-of-the-art deep image classification models

Deep learning has garnered much attention and hype over the years. It is no surprise that a ton of research work is being shared in reputed competitions, conferences, and journals worldwide centered around deep learning. It is particularly the image classification architectures that have been enjoying the spotlight for some years now, with iterative improvements being shared on a regular basis. Let us have a quick look at some of the best-performing and popular state-of-the-art deep image classification architectures:

AlexNet: This is the network that can be credited for opening the floodgates. Designed by one of the pioneers of deep learning, Geoffrey Hinton and team, this network reduced the top-five error rate to just 15.3%. It was also one of the first architectures to leverage GPUs for speeding up the learning process.
VGG-16: The network from Oxford's Visual Geometry Group is one of the best-performing architectures, widely used for benchmarking other designs. VGG-16 utilizes a simple architecture based on 3 x 3 convolutional layers stacked one over the other (16 times), followed by a max pooling layer to achieve formidable performance. This model was succeeded by a slightly more complex model named VGG19.
Inception: Also known as GoogleNet, this network was introduced in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2014, and achieved a top-five error rate of 6.67%. It was one of the first architectures to achieve near human performance. The novelty behind this network was the use of an inception layer, which involves concatenation of different-sized kernels at the same level.
ResNet: Introduced by Microsoft Research Asia, the residual network (ResNet) was a novel architecture utilizing batch normalization and skipping connections to achieve a top-five error rate of just 3.57%. It is many times deeper (152 layers) and more complex than simpler architectures like VGG.
MobileNet: While most architectures are in the arms race to outperform others, each new complex network requires even more compute power and data resources. MobileNet deviates from such architectures and is designed to be suitable for mobile and embedded systems. This network utilizes a novel idea of using depth-wise separable convolutions to reduce the overall number of parameters required to train a network.

We have provided a quick overview and recap of some of the state-of-the-art architectures in the deep learning-based image classification space. For a detailed discussion, readers can check out the Convolutional Neural Networks section in Chapter 3, Understanding Deep Learning Architectures.

Table of Contents for State-of-the-art deep image classification models

Create new playlist

Sign In

Sign Up

Table of Contents for
State-of-the-art deep image classification models