Chapter 2. Training Machine Learning Algorithms for Classification

In this chapter, we will make use of one of the first algorithmically described machine learning algorithms for classification, the perceptron and adaptive linear neurons. We will start by implementing a perceptron step by step in Python and training it to classify different flower species in the Iris dataset. This will help us to understand the concept of machine learning algorithms for classification and how they can be efficiently implemented in Python. Discussing the basics of optimization using adaptive linear neurons will then lay the groundwork for using more powerful classifiers via the scikit-learn machine-learning library in Chapter 3, A Tour of Machine Learning Classifiers Using Scikit-learn.

The topics that we will cover in this chapter are as follows:

  • Building an intuition for machine learning algorithms
  • Using pandas, NumPy, and matplotlib to read in, process, and visualize data
  • Implementing linear classification algorithms in Python

Artificial neurons – a brief glimpse into the early history of machine learning

Before we discuss the perceptron and related algorithms in more detail, let us take a brief tour through the early beginnings of machine learning. Trying to understand how the biological brain works to design artificial intelligence, Warren McCullock and Walter Pitts published the first concept of a simplified brain cell, the so-called McCullock-Pitts (MCP) neuron, in 1943 (W. S. McCulloch and W. Pitts. A Logical Calculus of the Ideas Immanent in Nervous Activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943). Neurons are interconnected nerve cells in the brain that are involved in the processing and transmitting of chemical and electrical signals, which is illustrated in the following figure:

Artificial neurons – a brief glimpse into the early history of machine learning

McCullock and Pitts described such a nerve cell as a simple logic gate with binary outputs; multiple signals arrive at the dendrites, are then integrated into the cell body, and, if the accumulated signal exceeds a certain threshold, an output signal is generated that will be passed on by the axon.

Only a few years later, Frank Rosenblatt published the first concept of the perceptron learning rule based on the MCP neuron model (F. Rosenblatt, The Perceptron, a Perceiving and Recognizing Automaton. Cornell Aeronautical Laboratory, 1957). With his perceptron rule, Rosenblatt proposed an algorithm that would automatically learn the optimal weight coefficients that are then multiplied with the input features in order to make the decision of whether a neuron fires or not. In the context of supervised learning and classification, such an algorithm could then be used to predict if a sample belonged to one class or the other.

More formally, we can pose this problem as a binary classification task where we refer to our two classes as 1 (positive class) and -1 (negative class) for simplicity. We can then define an activation function Artificial neurons – a brief glimpse into the early history of machine learning that takes a linear combination of certain input values Artificial neurons – a brief glimpse into the early history of machine learning and a corresponding weight vector Artificial neurons – a brief glimpse into the early history of machine learning, where Artificial neurons – a brief glimpse into the early history of machine learning is the so-called net input (Artificial neurons – a brief glimpse into the early history of machine learning):

Artificial neurons – a brief glimpse into the early history of machine learning

Now, if the activation of a particular sample Artificial neurons – a brief glimpse into the early history of machine learning, that is, the output of Artificial neurons – a brief glimpse into the early history of machine learning, is greater than a defined threshold Artificial neurons – a brief glimpse into the early history of machine learning, we predict class 1 and class -1, otherwise. In the perceptron algorithm, the activation function Artificial neurons – a brief glimpse into the early history of machine learning is a simple unit step function, which is sometimes also called the Heaviside step function:

Artificial neurons – a brief glimpse into the early history of machine learning

For simplicity, we can bring the threshold Artificial neurons – a brief glimpse into the early history of machine learning to the left side of the equation and define a weight-zero as Artificial neurons – a brief glimpse into the early history of machine learning and Artificial neurons – a brief glimpse into the early history of machine learning, so that we write Artificial neurons – a brief glimpse into the early history of machine learning in a more compact form Artificial neurons – a brief glimpse into the early history of machine learning and Artificial neurons – a brief glimpse into the early history of machine learning.

Note

In the following sections, we will often make use of basic notations from linear algebra. For example, we will abbreviate the sum of the products of the values in Artificial neurons – a brief glimpse into the early history of machine learning and Artificial neurons – a brief glimpse into the early history of machine learning using a vector dot product, whereas superscript T stands for transpose, which is an operation that transforms a column vector into a row vector and vice versa:

Artificial neurons – a brief glimpse into the early history of machine learning

For example: Artificial neurons – a brief glimpse into the early history of machine learning.

Furthermore, the transpose operation can also be applied to a matrix to reflect it over its diagonal, for example:

Artificial neurons – a brief glimpse into the early history of machine learning

In this book, we will only use the very basic concepts from linear algebra. However, if you need a quick refresher, please take a look at Zico Kolter's excellent Linear Algebra Review and Reference, which is freely available at http://www.cs.cmu.edu/~zkolter/course/linalg/linalg_notes.pdf.

The following figure illustrates how the net input Artificial neurons – a brief glimpse into the early history of machine learning is squashed into a binary output (-1 or 1) by the activation function of the perceptron (left subfigure) and how it can be used to discriminate between two linearly separable classes (right subfigure):

Artificial neurons – a brief glimpse into the early history of machine learning

The whole idea behind the MCP neuron and Rosenblatt's thresholded perceptron model is to use a reductionist approach to mimic how a single neuron in the brain works: it either fires or it doesn't. Thus, Rosenblatt's initial perceptron rule is fairly simple and can be summarized by the following steps:

  1. Initialize the weights to 0 or small random numbers.
  2. For each training sample Artificial neurons – a brief glimpse into the early history of machine learning perform the following steps:
    1. Compute the output value Artificial neurons – a brief glimpse into the early history of machine learning.
    2. Update the weights.

Here, the output value is the class label predicted by the unit step function that we defined earlier, and the simultaneous update of each weight Artificial neurons – a brief glimpse into the early history of machine learning in the weight vector Artificial neurons – a brief glimpse into the early history of machine learning can be more formally written as:

Artificial neurons – a brief glimpse into the early history of machine learning

The value of Artificial neurons – a brief glimpse into the early history of machine learning, which is used to update the weight Artificial neurons – a brief glimpse into the early history of machine learning, is calculated by the perceptron learning rule:

Artificial neurons – a brief glimpse into the early history of machine learning

Where Artificial neurons – a brief glimpse into the early history of machine learning is the learning rate (a constant between 0.0 and 1.0), Artificial neurons – a brief glimpse into the early history of machine learning is the true class label of the Artificial neurons – a brief glimpse into the early history of machine learningth training sample, and Artificial neurons – a brief glimpse into the early history of machine learning is the predicted class label. It is important to note that all weights in the weight vector are being updated simultaneously, which means that we don't recompute the Artificial neurons – a brief glimpse into the early history of machine learning before all of the weights Artificial neurons – a brief glimpse into the early history of machine learning were updated. Concretely, for a 2D dataset, we would write the update as follows:

Artificial neurons – a brief glimpse into the early history of machine learning
Artificial neurons – a brief glimpse into the early history of machine learning
Artificial neurons – a brief glimpse into the early history of machine learning

Before we implement the perceptron rule in Python, let us make a simple thought experiment to illustrate how beautifully simple this learning rule really is. In the two scenarios where the perceptron predicts the class label correctly, the weights remain unchanged:

Artificial neurons – a brief glimpse into the early history of machine learning
Artificial neurons – a brief glimpse into the early history of machine learning

However, in the case of a wrong prediction, the weights are being pushed towards the direction of the positive or negative target class, respectively:

Artificial neurons – a brief glimpse into the early history of machine learning
Artificial neurons – a brief glimpse into the early history of machine learning

To get a better intuition for the multiplicative factor Artificial neurons – a brief glimpse into the early history of machine learning, let us go through another simple example, where:

Artificial neurons – a brief glimpse into the early history of machine learning

Let's assume that Artificial neurons – a brief glimpse into the early history of machine learning, and we misclassify this sample as -1. In this case, we would increase the corresponding weight by 1 so that the activation Artificial neurons – a brief glimpse into the early history of machine learning will be more positive the next time we encounter this sample and thus will be more likely to be above the threshold of the unit step function to classify the sample as +1:

Artificial neurons – a brief glimpse into the early history of machine learning

The weight update is proportional to the value of Artificial neurons – a brief glimpse into the early history of machine learning. For example, if we have another sample Artificial neurons – a brief glimpse into the early history of machine learning that is incorrectly classified as -1, we'd push the decision boundary by an even larger extent to classify this sample correctly the next time:

Artificial neurons – a brief glimpse into the early history of machine learning

It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable and the learning rate is sufficiently small. If the two classes can't be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (epochs) and/or a threshold for the number of tolerated misclassifications—the perceptron would never stop updating the weights otherwise:

Artificial neurons – a brief glimpse into the early history of machine learning

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Now, before we jump into the implementation in the next section, let us summarize what we just learned in a simple figure that illustrates the general concept of the perceptron:

Artificial neurons – a brief glimpse into the early history of machine learning

The preceding figure illustrates how the perceptron receives the inputs of a sampleArtificial neurons – a brief glimpse into the early history of machine learning and combines them with the weights Artificial neurons – a brief glimpse into the early history of machine learning to compute the net input. The net input is then passed on to the activation function (here: the unit step function), which generates a binary output -1 or +1—the predicted class label of the sample. During the learning phase, this output is used to calculate the error of the prediction and update the weights.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.66.185