Chapter 10. Multilayer Perceptron

The concept of artificial neural networks is rooted in biology with the idea of mimicking some of the brain's functions. Computer scientists thought that such a concept could be applied to the broader problem of parallel processing [10:1]. The key question in the 1970s was: how can we distribute the computation of tasks across a network or cluster of machines without having to program each machine? One simple solution consists of training each machine to execute the given tasks. The popularity of neural networks surged in the 1990s.

At its core, a neural network is a nonlinear statistical model that leverages the logistic regression to create a nonlinear distributed model.

Note

Deep learning:

Deep learning techniques (introduced in the next chapter) extend the concept of artificial neural networks. This chapter should be regarded as the first part of the presentation of an algorithm generally associated with deep learning.

In this chapter, you will move beyond the hype and learn the following:

  • The concept and elements of the multilayer perceptron (MLP)
  • How to train a neural network using error backpropagation
  • The evaluation and tuning of MLP configuration parameters
  • A Scala implementation of the MLP classifier
  • A simple application of MLP for modeling currency exchange rates

Feed-forward neural networks (FFNN)

The brain is a very powerful information processing engine that surpasses the reasoning ability of computers in domains such as learning, inductive reasoning, prediction, vision, and speech recognition. However, the simplest computing device has the capability to process very large datasets well beyond the ability of the human brain.

The biological background

In biology, a neural network is composed of groups of neurons interconnected by synapses [10:2], as shown in the following image:

The biological background

Visualization of biological neurons and synapses

Neuroscientists have been especially interested in understanding how the billions of neurons in the brain can interact to provide human beings with parallel processing capabilities. The 1960s saw a new field of study emerging, known as connectionism. Connectionism marries cognitive psychology, artificial intelligence, and neuroscience. The goal was to create a model for mental phenomena. Although there are many forms of connectionism, the neural network models have become the most popular and the most taught of all connectionism models [10:3].

Biological neurons communicate through electrical charges known as stimuli. This network of neurons can be represented as a simple schematic, as follows:

The biological background

Representation of neuron layers, connections, and synapses

This representation categorizes groups of neurons as layers. The terminology used to describe the natural neural networks has a corresponding nomenclature for the artificial neural network:

The biological neural network

The artificial neuron network

Axon

Connection

Dendrite

Connection

Synapse

Weight

Potential

Weighted sum

Threshold

Bias weight

Signal, Stimulus

Activation

Group of neurons

Layer of neurons

In the biological world, stimuli do not propagate in any specific direction between neurons. An artificial neural network can have the same degree of freedom. The artificial neural networks most commonly used by data scientists have a predefined direction: from the input layer to output layers. These neural networks are known as feed-forward neural network or FFNN.

Mathematical background

The previous chapter, Chapter 9, Regression and Regularization, describes the concept of the hyperplane, which segregates a set of labeled data points into distinct classes during training. The hyperplane is defined by the linear model (or margin) wT .x + w0 = 0. The linear regression can be visualized as a simple connectivity model using neurons and synapses, as follows:

Mathematical background

A two-layer basic neural network (no hidden layer)

The feature x0=+1 is known as the bias input (or bias element), which corresponds to the intercept in the classic linear regression.

As with support vector machines, linear regression is appropriate for observations that can be linearly separable. The real world is usually driven by a nonlinear phenomenon. Therefore, logistic regression is naturally used to compute the output of the perceptron. For a set of input variables x = {xi}0,n and the weights w={wi}1,n, the output y is computed as (M1):

Mathematical background

Such an approach can be modeled as a FFNN, known as the multilayer perceptron [10:4].

An FFNN can be regarded as a stack of layers of logistic regression with the output layer as a linear regression. The value of the variables in each hidden layer is computed as the sigmoid of the dot product of the connection weights and the output of the previous layer. Although it's interesting, the theory behind artificial neural networks is beyond the scope of this book [10:5].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.26.138