Artificial neurons and neural networks

Let's briefly go over some of the basics of machine learning (ML) and neural networks (NNs). In Machine Learning, our goal is to take a collection of data with a particular set of labeled classes or characteristics and use these examples to train our system to predict the values of future data. We call a program or function that predicts classes or labels of future data based on prior training data a classifier.

There are many types of classifiers, but here we will be focusing on NNs. The idea behind NNs is that they (allegedly) work in a way that is similar to the human brain, in that they learn and classify data using a collection of artificial neurons (ANs), all connected together to form a particular structure. Let's step back for a moment, though, and look at what an individual AN is. In mathematics, this is just an affine function from the linear space Rn to R, like so:

We can see that this can be characterized as a dot product between a constant weight vector w and an input vector x, with an additional bias constant b added to the end. (Again, the only input into this function here is x; the other values are constants!)

Now, individually a single AN is fairly useless (and stupid), as their intelligence only emerges when acting in cooperation with a large number of other ANs. Our first step is to stack a collection of m similar ANs on top of each other so as to form what we will call a dense layer (DL). This is dense because each neuron will process every single input value from – each AN will take in an array or vector value from Rn and output a single value in R. Since there are m neurons, this means that we can say their output collectively is in the space Rm. We will notice that if we stack the weights for each neuron in our layer, so as to form an m x n matrix of weights, we can then just calculate the output of each neuron with a matrix multiplication followed by the addition of the appropriate biases:

Now, let's suppose that we want to build an NN classifier that can classify k different classes; we can create a new additional dense layer that takes in the m values from the prior dense layer, and outputs k values. Supposing that we have the appropriate weight and bias values for each layer (which are certainly not trivial to find), and that we also have the appropriate activation function set up after each layer (which we will define later), this will act as a classifier between our k distinct classes, giving us the probability of x falling into each respective class based on the outputs of the final layer. Of course, we're getting way ahead of ourselves here, but that is, in a nutshell, how an NN works.

Now, it seems like we can just keep connecting dense layers to each other into long chains to achieve classifications. This is what is known as a DNN. When we have a layer that is not directly connected to the inputs or outputs, that is known as a hidden layer. The strength of a DNN is that the additional layers allow the NN to capture abstractions and subtleties of the data that a shallow NN could not pick up on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.179.220