Understanding the McCulloch-Pitts neuron

In 1943, Warren McCulloch and Walter Pitts published a mathematical description of neurons as they were believed to operate in the brain. A neuron receives input from other neurons through connections on its dendritic tree, which are integrated to produce an output at the cell body (or soma). The output is then communicated to other neurons via a long wire (or axon), which eventually branches out to make one or more connections (at axon terminals) on the dendritic tree of other neurons.

An example neuron is shown in the following diagram:

McCulloch and Pitts described the inner workings of such a neuron as a simple logic gate that would be either on or off, depending on the input it received on its dendritic tree. Specifically, the neuron would sum up all of its inputs, and if the sum exceeded a certain threshold, an output signal would be generated and passed on by the axon.

However, today we know that real neurons are much more complicated than that. Biological neurons perform intricate nonlinear mathematical operations on thousands of inputs and can change their responsiveness dynamically depending on the context, importance, or novelty of the input signal. You can think of real neurons being as complex as computers and of the human brain being as complex as the internet.

Let's consider a simple artificial neuron that receives exactly two inputs, x0 and x1. The job of the artificial neuron is to calculate a sum of the two inputs (usually in the form of a weighted sum), and if this sum exceeds a certain threshold (often zero), the neuron will be considered active and output a one; else it will be considered silent and output a minus one (or zero). In more mathematical terms, the output, y, of this McCulloch-Pitts neuron can be described as follows:

In the preceding equation, w0 and w1 are weight coefficients, which, together with x0 and x1, make up the weighted sum. In textbooks, the two different scenarios where the output, y, is either +1 and -1 would often be masked by an activation function, ϕ, which could take on two different values:

Here, we introduce a new variable, z (the so-called network input), which is equivalent to the weighted sum: z = w0x0 + w1x1. The weighted sum is then compared to a threshold, θ, to determine the value of ϕ and subsequently the value of y. Apart from that, these two equations say exactly the same thing as the preceding one.

If these equations look strangely familiar, you might be reminded of Chapter 1, A Taste of Machine Learning, when we were talking about linear classifiers.

And you are right, a McCulloch-Pitts neuron is essentially a linear, binary classifier!

You can think of it this way: x0 and x1 are the input features, w0 and w1 are weights to be learned, and the classification is performed by the activation function, ϕ. If we do a good job of learning the weights, which we would do with the help of a suitable training set, we could classify data as positive or negative samples. In this scenario, ϕ(z)=θ would act as the decision boundary.

This might all make more sense with the help of the following diagram:

On the left, you can see the neuron's activation function, ϕ, plotted against z. Remember that z is nothing more than the weighted sum of the two inputs x0 and x1. The rule is that as long as the weighted sum is below some threshold, θ, the output of the neuron is -1; above θ, the output is +1.

On the right, you can see the decision boundary denoted by ϕ(z)=θ, which splits the data into two regimes, ϕ(z)<θ (where all data points are predicted to be negative samples) and ϕ(z)>θ (where all data points are predicted to be positive samples).

The decision boundary does not need to be vertical or horizontal, it can be tilted as shown in the preceding diagram. But in the case of a single McCulloch-Pitts neuron, the decision boundary will always be a straight line.

Of course, the magic lies with learning the weight coefficients, w0 and w1, such that the decision boundary comes to lie right between all positive and all negative data points.

To train a neural network, we generally need three things:

  • Training data: It is no surprise to learn that we need some data samples with which the effectiveness of our classifier can be verified.
  • Cost function (also known as loss function): A cost function provides a measure of how good the current weight coefficients are. There is a wide range of cost functions available, which we will talk about toward the end of this chapter. One solution is to count the number of misclassifications. Another one is to calculate the sum of squared errors.
  • Learning rule: A learning rule specifies mathematically how we have to update the weight coefficients from one iteration to the next. This learning rule usually depends on the error (measured by the cost function) we observed on the training data.

This is where the work of renowned researcher Frank Rosenblatt comes in.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.212.102