Layers

There are three basic types of layers: input layers, hidden layers, and output layers. They respectively contain the input, hidden, and output nodes. Generally, you might have only a single input layer and a single output layer—each can contain one or more nodes—but you can have as many hidden layers as you want.

The number of input nodes is equal to number of input variables. The number of output nodes must be equal to the number of variables or categories predicted.

The following diagram shows a hypothetical arrangement for a feedforward NN:

Figure 8.4: Feedforward NN representation

Each circle represents a node, and the dashed arrows represent the connection between them. The inscriptions inside a node show which basic type it is: i means input, H for hidden and O for output. Such a network has one input layer with three input nodes, two hidden layers with two hidden nodes each, and one output layer with one output node.

Cutting unnecessary inputs, which can be a tricky task, can spare you training time and bump upaccuracy. Don't look for correlation and partial correlation because that only matters for linear relations, and ANNs will look at both linear and non-linear relations at the same time (if you have enough data). Genetic algorithms are usually helpful, but they also take time to run.

Each additional hidden layer looks into more complex features within data. For example, an NN trained to recognize faces may only recognize silhouettes in the first hidden layer; a second hidden layer may recognize shapes, while a third one is able to recognize noses, eyes, and mouths.

As you go deeper into hidden layers, more complex features are being explored, meaning that you might need more layers for more complex tasks, while easy ones can be dealt with by only a couple of hidden layers. Deep learning involves ANN models with more than a single hidden layer; this is, the reason every deep learning model is an ANN but not every ANN is considered deep learning.

Even if the problem at hand is so simple that only requires a single hidden layer, more often than not, arranging hidden nodes into two consecutive hidden layers will most probably lead to less time required to train the network and more accuracy. One must agree that if the problem at hand is simple enough, ANNs wouldn't be required at all.

The design of layers and nodes through an ANN is called an architecture.

The architecture that it made the breakthrough during the LSVRC-2010 contest had seven hidden layers. There is no magic rule that will give you the best architecture for the problem at hand. Usually, the best place to start is searching for what has worked for other people with similar problems. From there, you can work your mojo by exploring alternative solutions.

There are techniques that can help you through exploration, such as grid search, random grid search, and genetic algorithms.

Besides their basic denomination—input, hidden, and output—layers can also be called by how they work. For example, the ones displayed in Figure 8.4 would be called dense layers (or fully connected) because they are fully connected to the next layers.

The most common layers are these:

  • Dense (fully connected): Nodes are fully connected to the layer behind it. Each connection is weighted by a trainable parameter called a weight. This means that no information is spared and there are lots of parameters to balance during the training, which makes this layer costly.
  • Normalization: Normalizes the activations from the previous layer at each batch. Batches are small samples of fixed sizes; one batch at a time is used to update the trainable parameters. 
  • Convolutional: Layers that slide (convolve) over the whole input, looking to it over fixed-size windows. They are well known for their good performance in computer vision problems.
  • Max pooling: Reduces the dimensionality of the output, making it a great companion to the convolutional layer, which often increases the dimensionality.
  • Recurrent: Recurrent layers enable the output to affect the same node it came from. They are frequently used in time-related problems, such as stock price prediction.

Each of these specialized layers works in a specific way; in other words, they are better suited for doing different things. Convolutional layers are pretty common for networks dealing with classification through images. Recurrent layers are frequent in architectures designed to handle time-series regression problems. Dense is common to all sorts of networks.

It's common to have more than one type of specialized layer in the same network.

Once the architecture is locked, it is finally time to train the network. Training will require a specific algorithm (training strategy) to do it. It's usually not that hard to pick one, given that there are plenty that will effectively handle the training task. The backpropagation algorithm was the one that first stood, as feasible, improved versions, such as Adams, were developed later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.211.92