Semantic segmentation

In semantic segmentation, the goal is to label each individual pixel of an image according to what object class that pixel belongs to. The final result is a bitmap where each pixel will belong to a certain class:

There are several popular CNN architectures that have been shown to do well at the segmentation task. Most of them are variants of a class of model called an autoencoder, which we will look at in detail in Chapter 6, Autoencoders, Variational Autoencoders, and Generative Models. For now, their basic idea is to first spatially reduce the input volume to some compressed representation and then recover the original spatial size:

In order to increase the spatial size, there are some common operations that are used, which include the following:

Max Unpooling
Deconvolution/Transposed Convolution
Dilated/Atrous Convolution

There's also a new variant of softmax that is used in the semantic segmentation task that we will learn about, which is called spatial softmax.

In this section, we will learn about two popular models that perform well at semantic segmentation and have very straightforward architectures to understand; they are as listed:

FCN (Fully Convolutional Networks)
Segnet

Some other implementation details that need to be addressed are these:

The final upsampling layer (Deconv) needs to have as many filters as classes to segment, and your label "colors" need to match the indexes inside this last layer, otherwise you may have some NaN issues during training
We need an Argmax layer to select the pixel with the strongest probability on the output tensor (during prediction time only)
Our loss needs to take into account all the pixels on the output tensor

Table of Contents for Semantic segmentation

Create new playlist

Sign In

Sign Up

Table of Contents for
Semantic segmentation