Okay, let's get started with building a real machine learning model. First, we'll see the proposed machine learning problem: font classification. Then, we'll review a simple algorithm for classification, called logistic regression. Finally, we'll implement logistic regression in TensorFlow.
Before we jump in, let's load all the necessary modules:
import tensorflow as tf import numpy as np
If you're copying and pasting to IPython, make sure your autoindent
property is set to OFF
:
%autoindent
The tqdm
module is optional; it just shows nice progress bars:
try: from tqdm import tqdm except ImportError: def tqdm(x, *args, **kwargs): return x
Next, we'll set a seed of 0
, just to get consistent data splitting from run to run:
# Set random seed np.random.seed(0)
In this book, we've provided a dataset of the images of characters using five fonts. For convenience, these are stored in a compressed NumPy file (data_with_labels.npz
), which can be found in the download package of this book. You can easily load these into Python with numpy.load
:
# Load data data = np.load('data_with_labels.npz') train = data['arr_0']/255. labels = data['arr_1']
The train
variable here holds the actual pixel values scaled from 0 to 1, and labels
holds the type of font that it was; therefore, it'll be either 0, 1, 2, 3, or 4, as there are five fonts in total. You can print out these values, so you can look at them using the following code:
# Look at some data print(train[0]) print(labels[0])
However, that's not very instructive, as most of the values are zeroes and only the central part of the screen contains the image data:
If you have Matplotlib installed, now is a good place to import it. We'll use plt.ion()
to automatically bring up figures when needed:
# If you have matplotlib installed import matplotlib.pyplot as plt plt.ion()
Here are some example images of characters from each font:
Yeah, they're pretty flashy. In the dataset, each image is represented as a 36 x 36 two-dimensional matrix of pixel darkness values. The 0 value represents a white pixel, while 255 represents a black pixel. Everything in between is a shade of gray. Here's the code to display these fonts on your own machine:
# Let's look at a subplot of one of A in each font f, plts = plt.subplots(5, sharex=True) c = 91 for i in range(5): plts[i].pcolor(train[c + i * 558], cmap=plt.cm.gray_r)
If your plot appears really wide, you can easily resize the window just using your mouse. It's often much more work to resize it ahead of time in Python if you're simply plotting interactively. Our goal is to decide which font an image belongs to, given that we have many other labeled images of the fonts. To expand the dataset and help avoid overfitting, we have also jittered each character around in the 36 x 36 area, giving us nine times as many data points.
It may be helpful to come back to this after working with later models. It's important to keep the original data in mind, no matter how advanced the final model is.
If you're familiar with linear regression, you're halfway toward understanding logistic regression. Basically, we're going to assign a weight to each pixel in the image, then take the weighted sum of those pixels (beta for weights and X for pixels). This will give us a score for that image being a particular font. Every font will have its own set of weights, as they will value pixels differently. To convert these scores into proper probabilities (represented by Y), we will use what's called the softmax
function to force their sum to be between 0 and 1, as illustrated next. Whatever probability is the greatest for a particular image, we will classify it into the associated class.
You can read more about the theory of logistic regression in most statistical modeling textbooks. Here is its formula:
One good reference that focuses on applications is William H. Greene's Econometric Analysis, Pearson, published in the year 2012.
Implementing logistic regression is pretty easy in TensorFlow and will serve as scaffolding for more complex machine learning algorithms. First, we need to convert our integer labels into a one-hot format. This means, instead of labeling an image with font class 2, we transform the label into [0, 0, 1, 0, 0]. That is, we stick 1
in position two (note 0-up counting is common in computer science) and 0
for every other class. Here's the code for our to_onehot
function:
def to_onehot(labels,nclasses = 5): ''' Convert labels to "one-hot" format. >>> a = [0,1,2,3] >>> to_onehot(a,5) array([[ 1., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0.], [ 0., 0., 1., 0., 0.], [ 0., 0., 0., 1., 0.]]) ''' outlabels = np.zeros((len(labels),nclasses)) for i,l in enumerate(labels): outlabels[i,l] = 1 return outlabels
With this done, we can go ahead and call the function:
onehot = to_onehot(labels)
For the pixels, we don't really want a matrix in this case, so we'll flatten the 36 x 36 numbers into a one-dimensional vector of length 1,296, but this will come a little bit later. Also, recall that we've rescaled the pixel values of 0-255 so that they fall between 0 and 1.
Okay, our final piece of preparation is to split our dataset into training and validation sets. This will help us catch overfitting later on. The training set will help us determine the weights in our logistic regression model, and the validation set will just be used to confirm that those weights are reasonably correct on new data:
# Split data into training and validation indices = np.random.permutation(train.shape[0]) valid_cnt = int(train.shape[0] * 0.1) test_idx, training_idx = indices[:valid_cnt], indices[valid_cnt:] test, train = train[test_idx,:], train[training_idx,:] onehot_test, onehot_train = onehot[test_idx,:], onehot[training_idx,:]
Okay, let's kick off the TensorFlow code by creating an interactive session:
sess = tf.InteractiveSession()
With this, we've started our first model in TensorFlow.
We're going to use a placeholder variable for x
, which represents our input images. This is just to tell TensorFlow that we will supply the value for this node via feed_dict
later on:
# These will be inputs ## Input pixels, flattened x = tf.placeholder("float", [None, 1296])
Also, note that we can specify the shape of this tensor, and here we have used None
as one of the sizes. The None
size allows us to send an arbitrary number of data points into the algorithm at once for batch processing. We'll use the variable y_
likewise to hold our known labels to be used for training later on:
## Known labels y_ = tf.placeholder("float", [None,5])
To perform logistic regression, we need a set of weights (W
). In fact, we need 1,296 weights for each of the five font classes, which will give us our shape. Note that we also want to include an extra weight for each class as a bias (b
). This is the same as adding an extra input variable that always takes the value 1
:
# Variables W = tf.Variable(tf.zeros([1296,5])) b = tf.Variable(tf.zeros([5]))
With all these TensorFlow variables floating around, we need to make sure they get initialized. Let's call them now:
# Just initialize sess.run(tf.global_variables_initializer())
Good job! You've got everything prepared. Now you can implement the softmax
formula to compute probabilities. Because we set up our weights and input very carefully, TensorFlow makes this task very easy with just a call to tf.matmul
and tf.nn.softmax
:
# Define model y = tf.nn.softmax(tf.matmul(x,W) + b)
That's it! You've implemented an entire machine learning classifier in TensorFlow. Nice work. But where do we get the values for the weights? Let's take a look at using TensorFlow to train the model.
3.145.20.21