While there are better ways to implement purely linear models, simplifying DNNs with a varying number of layers is where TensorFlow and learn
really shine.
We'll use the same input features, but now we'll build a DNN with two hidden layers, first with 10
neurons and then 5
. Creating this model will only take one line of Python code; it could not be easier.
The specification is similar to our linear model. We still need SKCompat
, but now it's learn.DNNClassifier
. For arguments, there's one additional requirement: the number of neurons on each hidden layer, passed as a list. This one simple argument, which really captures the essence of a DNN model, puts the power of deep learning at your fingertips.
There are some optional arguments to this as well, but we'll only mention optimizer
. This allows you to choose between different common optimizer routines, such as Stochastic Gradient Descent (SGD) or Adam. Very convenient!
# Dense neural net classifier = estimator.SKCompat(learn.DNNClassifier( feature_columns = feature_columns, hidden_units=[10,5], n_classes=5, optimizer='Adam'))
The training and evaluation occur exactly as they do with the linear model. Just for demonstration, we can also look at the confusion matrix created by this model. Note that we haven't trained much, so this model may not compete with our earlier creations using pure TensorFlow:
# Same training call classifier.fit(train.reshape([-1,36*36]), train_labels, steps=1024, batch_size=32) # simple accuracy test_probs = classifier.predict(test.reshape([-1,36*36])) sklearn.metrics.accuracy_score(test_labels, test_probs['classes']) # confusion is easy train_probs = classifier.predict(train.reshape([-1,36*36])) conf = metrics.confusion_matrix(train_labels, train_probs['classes']) print(conf)
CNNs power some of the most successful machine learning models out there, so we'd hope that learn
supports them. In fact, the library supports using arbitrary TensorFlow code! You'll find that this is a blessing and a curse. Having arbitrary code available means you can use learn
to do almost anything you can do with pure TensorFlow, giving maximum flexibility. But the general interface tends to make the code more difficult to read and write.
If you find yourself fighting with the interface to make some moderately complex model work in learn
, it may be time to use pure TensorFlow or switch to another API.
To demonstrate this generality, we'll build a simple CNN to attack our font classification problem. It will have one convolutional layer with four filters, followed by a flattening to a hidden dense layer with five neurons, and finally ending with the densely connected output logistic regression.
To get started, let's do a couple more imports. We want access to generic TensorFlow, but we also need the layers
module to call TensorFlow layers
in a way that learn
expects:
# Access general TF functions import tensorflow as tf import tensorflow.contrib.layers as layers
The generic interface forces us to write a function which creates the operations for our model. You may find this tedious, but that's the price of flexibility.
Start a new function called conv_learn
with three arguments. X
will be the input data, y
will be the output labels (not yet one-hot encoded), and mode
determines whether you are training or predicting. Note that you'll never directly interact with this function; you merely pass it to a constructor that expects these arguments. So, if you wanted to vary the number or type of layers, you would need to write a new model function (or another function that would generate such a model function):
def conv_learn(X, y, mode):
As this is a convolutional model, we need to make sure our data is formatted correctly. In particular, this means reshaping the input to have not only the correct two-dimensional shape (36x36), but also 1 color channel (the last dimension). This is part of a TensorFlow computation graph, so we use tf.reshape
, not np.reshape
. Likewise, because this is a generic graph, we want our outputs to be one-hot encoded, and tf.one_hot
provides that functionality. Note that we have to describe how many classes there are (5
), what a set value should be (1
), and what an unset value should be (0
):
# Ensure our images are 2d X = tf.reshape(X, [-1, 36, 36, 1]) # We'll need these in one-hot format y = tf.one_hot(tf.cast(y, tf.int32), 5, 1, 0)
Now the real fun begins. To specify the convolutional layer, let's initialize a new scope, conv_layer
. This will just make sure we don't clobber any variables. layers.convolutional
provides the basic machinery. It accepts our input (a TensorFlow tensor), a number of outputs (really the number of kernels or filters), and the size of the kernel, here, a 5x5 window. For an activation function, let's use Rectified Linear, which we can call from the main TensorFlow module. This gives us our basic convolutional output, h1
.
Max pooling actually occurs exactly as it does in regular TensorFlow, neither easier nor harder. The function, tf.nn.max_pool
with the usual kernel size and strides works as expected. Save this into p1
:
# conv layer will compute 4 kernels for each 5x5 patch with tf.variable_scope('conv_layer'): # 5x5 convolution, pad with zeros on edges h1 = layers.convolution2d(X, num_outputs=4, kernel_size=[5, 5], activation_fn=tf.nn.relu) # 2x2 Max pooling, no padding on edges p1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
Now, to flatten the tensor at this point, we need to compute the number of elements in our would-be one-dimensional tensor. One way to do this is by multiplying all the dimension values (except the batch_size
, which occupies the first position) together. This particular operation can occur outside the computation graph, so we use np.product
. Once supplied with the total size, we can pass it to tf.reshape
to reslice the intermediate tensor in the graph:
# Need to flatten conv output for use in dense layer p1_size = np.product( [s.value for s in p1.get_shape()[1:]]) p1f = tf.reshape(p1, [-1, p1_size ])
Now it's time for the densely connected layer. The layers
module makes an appearance again, this time with the fully_connected
function (another name for a dense layer). This takes the previous layer, the number of neurons, and the activation function, again supplied by general TensorFlow.
For demonstration purposes, let's add a dropout here as well; layers.dropout
provides the interface. As expected, it needs the previous layer as well as a probability of keeping a given node output. But it also needs this mode
argument that we passed into the original conv_learn
function. All this complex interface is saying is to only drop nodes during training. If you can handle that, we're almost through the model!
# densely connected layer with 32 neurons and dropout h_fc1 = layers.fully_connected(p1f, 5, activation_fn=tf.nn.relu) drop = layers.dropout(h_fc1, keep_prob=0.5, is_training=mode == tf.contrib.learn.ModeKeys.TRAIN)
Now for some bad news. We need to write out the final linear model, loss function, and optimization parameters manually. This is something that can change from version to version, as it used to be easier on the user for some circumstances, but more difficult to maintain the backend. But let's persevere; it's really not too arduous.
Another layers.fully_connected
layer creates the final logistic regression. Note that our activation here should be None
, as it is purely linear. What handles the logistic side of the equation is the loss function. Thankfully, TensorFlow supplies a softmax_cross_entropy
function, so we don't need to write this out manually. Given inputs, outputs, and a loss function, we can apply an optimization routine. Again, layers.optimize_loss
minimizes the pain, as well as the function in question. Pass it your loss node, optimizer (as a string), and a learning rate. Further, give it this get_global_step()
parameter to ensure the optimizer handles decay properly.
Finally, our function needs to return a few things. One, it should report the predicted classes. Next, it must supply the loss node output itself. And, finally, the training node must be available to external routines to actually execute everything:
logits = layers.fully_connected(drop, 5, activation_fn=None) loss = tf.losses.softmax_cross_entropy(y, logits) # Setup the training function manually train_op = layers.optimize_loss( loss, tf.contrib.framework.get_global_step(), optimizer='Adam', learning_rate=0.01) return tf.argmax(logits, 1), loss, train_op
While specifying the model may be cumbersome, using it is just as easy as before. Now, use learn.Estimator
, the most generic routine, and pass in your model function for model_fn
. And don't forget the SKCompat
!
Training works exactly as before, just note that we don't need to reshape the inputs here, since that's handled inside the function.
To predict with the model, you can simply call classifier.predict
, but note that you get your first argument returned by the function as output. We opted to return the class, but it would also be reasonable to return the probabilities from the softmax
function as well. That's all regarding the basics of the tf.contrib.learn
models!
# Use generic estimator with our function classifier = estimator.SKCompat( learn.Estimator( model_fn=conv_learn)) classifier.fit(train,train_labels, steps=1024, batch_size=32) # simple accuracy metrics.accuracy_score(test_labels,classifier.predict(test))
While training and prediction are the core uses of models, it's important to be able to study the inside of models as well. Unfortunately, this API makes it difficult to extract parameter weights. Thankfully, this section provides some simple examples of a weakly documented feature to get the weights out of the tf.contrib.learn
models.
To pull out the weights of a model, we really need to get the value from certain points in the underlying TensorFlow computation graph. TensorFlow provides many ways to do this, but the first problem is just figuring out what your variable of interest is called.
A list of variable names in your learn
graph is available, but it's buried under the hidden attribute, _estimator
. Calling classifier._estimator.get_variable_names()
returns a list of strings of your various names. Many of these will be uninteresting, such as the OptimizeLoss
entries. In our case, we're looking for the conv_layer
and fully_connected
elements:
# See layer names print(classifier._estimator.get_variable_names()) ['OptimizeLoss/beta1_power', 'OptimizeLoss/beta2_power', 'OptimizeLoss/conv_layer/Conv/biases/Adam', 'OptimizeLoss/conv_layer/Conv/biases/Adam_1', 'OptimizeLoss/conv_layer/Conv/weights/Adam', 'OptimizeLoss/conv_layer/Conv/weights/Adam_1', 'OptimizeLoss/fully_connected/biases/Adam', 'OptimizeLoss/fully_connected/biases/Adam_1', 'OptimizeLoss/fully_connected/weights/Adam', 'OptimizeLoss/fully_connected/weights/Adam_1', 'OptimizeLoss/fully_connected_1/biases/Adam', 'OptimizeLoss/fully_connected_1/biases/Adam_1', 'OptimizeLoss/fully_connected_1/weights/Adam', 'OptimizeLoss/fully_connected_1/weights/Adam_1', 'OptimizeLoss/learning_rate', 'conv_layer/Conv/biases', 'conv_layer/Conv/weights', 'fully_connected/biases', 'fully_connected/weights', 'fully_connected_1/biases', 'fully_connected_1/weights', 'global_step']
Figuring out which entry is the layer you're looking for can be a challenge. Here, conv_layer
is obviously from our convolutional layer. However, you see two fully_connected
elements, one is our dense layer at flattening, and one is the output weights. It turns out that they are named in the order specified. We created the dense hidden layer first, so it gets the basic fully_connected
name, while the output layer came last, so it has a _1
tacked onto it. If you're unsure, you can always look at the shapes of the weight arrays, depending on the shape of your model.
To actually get at the weights, it's another arcane call. This time, classifier._estimator.get_variable_value
, supplied with the variable name string, produces a NumPy array with the relevant weights. Try it out for the convolutional weights and biases, as well as the dense layers:
# Convolutional Layer Weights print(classifier._estimator.get_variable_value( 'conv_layer/Conv/weights')) print(classifier._estimator.get_variable_value( 'conv_layer/Conv/biases')) # Dense Layer print(classifier._estimator.get_variable_value( 'fully_connected/weights')) # Logistic weights print(classifier._estimator.get_variable_value( 'fully_connected_1/weights'))
Now, armed with the esoteric knowledge of how to peer inside tf.contrib.learn
neural networks, you're more than capable with this high-level API. While it is convenient in many situations, it can be cumbersome in others. Never be afraid to pause and consider switching to another library; use the right machine learning tool for the right machine learning job.
18.227.183.131