Let's see how to build a simple classifier using some training data.
simple_classifier.py
file that is already provided to you as reference. Assuming that you imported the numpy
and matplotlib.pyplot
packages like we did in the last chapter, let's create some sample data:X = np.array([[3,1], [2,5], [1,8], [6,4], [5,2], [3,5], [4,7], [4,-1]])
y = [0, 1, 1, 0, 0, 1, 1, 0]
y
list contains 0s and 1s. In general, if you have N classes, then the values in y
will range from 0 to N-1. Let's separate the data into classes based on the labels:class_0 = np.array([X[i] for i in range(len(X)) if y[i]==0]) class_1 = np.array([X[i] for i in range(len(X)) if y[i]==1])
plt.figure() plt.scatter(class_0[:,0], class_0[:,1], color='black', marker='s') plt.scatter(class_1[:,0], class_1[:,1], color='black', marker='x')
This is a scatterplot, where we use squares and crosses to plot the points. In this context, the marker
parameter specifies the shape you want to use. We use squares to denote points in class_0
and crosses to denote points in class_1
. If you run this code, you will see the following figure:
X
and y
to create two lists. If you were asked to inspect the datapoints visually and draw a separating line, what would you do? You will simply draw a line in between them. Let's go ahead and do this:line_x = range(10) line_y = line_x
plt.figure() plt.scatter(class_0[:,0], class_0[:,1], color='black', marker='s') plt.scatter(class_1[:,0], class_1[:,1], color='black', marker='x') plt.plot(line_x, line_y, color='black', linewidth=3) plt.show()
We built a simple classifier using the following rule: the input point (a
, b
) belongs to class_0
if a
is greater than or equal to b
; otherwise, it belongs to class_1
. If you inspect the points one by one, you will see that this is, in fact, true. This is it! You just built a linear classifier that can classify unknown data. It's a linear classifier because the separating line is a straight line. If it's a curve, then it becomes a nonlinear classifier.
This formation worked fine because there were a limited number of points, and we could visually inspect them. What if there are thousands of points? How do we generalize this process? Let's discuss that in the next recipe.
18.119.139.50