What is deep learning?

In machine learning (ML), we try to automatically discover rules for mapping input data to a desired output. In this process, it's very important to create appropriate representations of data. For example, if we want to create an algorithm to classify an email as spam/ham, we need to represent the email data numerically. One simple representation could be a binary vector where each component depicts the presence or absence of a word from a predefined vocabulary of words. Also, these representations are task-dependent, that is, representations may vary according to the final task that we desire our ML algorithm to perform.

In the preceding email example, instead of identifying spam/ham if we want to detect sentiment in the email, a more useful representation of the data could be binary vectors where the predefined vocabulary consists of words with positive or negative polarity. The successful application of most of the ML algorithms, such as random forests and logistic regression, depends on how good the data representation is. How do we get these representations? Typically, these representations are human-crafted features that are designed iteratively by making some intelligent guesses. This step is called feature-engineering, and is one of the crucial steps in most ML algorithms. Support Vector Machines (SVMs), or kernel methods in general, try to create more relevant representations of the data by transforming the hand-crafted representation of data into a higher-dimensional-space representation where solving the ML task using either classification or regression becomes easy. However, SVMs are hard to scale to very large datasets and are not that successful for problems such as image-classification and speech-recognition. Ensemble models, such as random forests and Gradient Boosting Machines (GBMs), create a collection of weak models that are specialized to do a small task well and then combine these weak models in some way to arrive at the final output. They work quite well when we have very large input dimensions, and creating handcrafted features is a very time-consuming step. In summary, all the previously mentioned ML methods work with a shallow representation of data involving the representation of data by a set of handcrafted features followed by some non-linear transformations.

Deep learning is a subfield of ML, where a hierarchical representation of the data is created. Higher levels of the hierarchy are formed by the composition of lower-level representations. More importantly, this hierarchy of representation is learned automatically from data by completely automating the most crucial step in ML, called feature-engineering. Automatically learning features at multiple levels of abstraction allows a system to learn complex representations of the input to the output directly from data, without depending completely on human-crafted features.

A deep learning model is actually a neural network with multiple hidden layers, which can help create layered hierarchical representations of the input data. It is called deep because we end up using multiple hidden layers to get the representations. In the simplest of terms, deep learning can also be called hierarchical feature-engineering (of course, we can do much more, but this is the core principle). One simple example of a deep neural network can be a multilayered perceptron (MLP) with more than one hidden layer. Let's consider the MLP-based face-recognition system in the following figure. The lowest-level features that it learns are some edges and patterns of contrasts. The next layer is then able to use those patterns of local contrast to resemble eyes, noses, and lips. Finally, the top layer uses those facial features to create face templates. The deep network is composing simple features to create features of increasing complexity, as depicted in the following diagram:

Hierarchical feature representation with deep neural nets (source: https://www.rsipvision.com/exploring-deep-learning/)

To understand deep learning, we need to have a clear understanding of the building blocks of neural networks, how these networks are trained, and how we are able to scale such training algorithms to very large deep networks. Before we dive into more details about neural networks, let's try to answer one question: Why deep learning now? The theory of neural networks, even the Convolution Neural Networks (CNNs) were built back in the 1990s. The reason they've became more popular now is due to the following three reasons:

Availability of efficient hardware: Moore's law has enabled CPUs with better and faster processing capability and computing power. Besides this, GPUs have also been really useful in computing millions of matrix operations at scale, which is the most common operation in any deep learning model. The availability of SDKs, such as CUDA, have helped research communities rewrite some highly parallelizable jobs to run on a few GPUs, replacing huge CPU clusters. Model training involves many small linear algebra operations, such as matrix multiplications and dot products, which were very efficiently implemented in CUDA to run in GPUs.
Availability of large data sources and cheaper storage: We now have free access to huge collections of labeled training sets for text, image, and speech.

Advances in optimization algorithms that are used to train neural networks: Traditionally, there was only one algorithm used to learn the weights in a neural network, gradient descent or Stochastic Gradient Descent (SGD). SGD has a few limitations, such as getting stuck in a local minima and slow convergence, that are overcome by the newer algorithms. We will discuss these algorithms in detail in the later sections on Neural network basics.

Table of Contents for What is deep learning?

Create new playlist

Sign In

Sign Up

Table of Contents for
What is deep learning?