Training and predicting new data from a deep neural network

In this section we will learn how to train deep neural networks and use them to generate predictions on new data. The examples for this section will use the activity data we have worked with before, and the following code simply sets up the data:

use.train.x <- read.table("UCI HAR Dataset/train/X_train.txt")
use.test.x <- read.table("UCI HAR Dataset/test/X_test.txt")

use.train.y <- read.table("UCI HAR Dataset/train/y_train.txt")[[1]]
use.test.y <- read.table("UCI HAR Dataset/test/y_test.txt")[[1]]

use.train <- cbind(use.train.x, Outcome = factor(use.train.y))
use.test <- cbind(use.test.x, Outcome = factor(use.test.y))

use.labels <- read.table("UCI HAR Dataset/activity_labels.txt")

h2oactivity.train <- as.h2o(
  use.train,
  destination_frame = "h2oactivitytrain")

h2oactivity.test <- as.h2o(
  use.test,
  destination_frame = "h2oactivitytest")

We have already learned the components of training a deep prediction model. We use the h2o.deeplearning() function as we did for the auto-encoder models, but specify the variable names for both the x and y arguments. Before, we included the testing data to automatically get performance metrics on both training and testing data. However, to show how to generate predictions on new data, we do not include it in the call to h2o.deeplearning(). The activation function used is a linear rectifier with dropout both on the input variables (20%) and the hidden neurons (50%). This little example is a shallow network with only 50 hidden neurons and 10 training iterations. The cost (loss) function is cross-entropy:

mt1 <- h2o.deeplearning(
  x = colnames(use.train.x),
  y = "Outcome",
  training_frame= h2oactivity.train,
  activation = "RectifierWithDropout",
  hidden = c(50),
  epochs = 10,
  loss = "CrossEntropy",
  input_dropout_ratio = .2,
  hidden_dropout_ratios = c(.5), ,
  export_weights_and_biases = TRUE
)

We show the stored object by simply typing its name in the R console. The first information is about the type of model. The outcome has six discrete levels so a multinomial model is used. The model includes a total of 28,406 weights/biases. Biases are like intercepts or constant offsets. Because this is a feedforward neural network, there are only weights between adjacent layers. Input variables do not have biases, but hidden neurons and outcomes do. The 28,406 weights are made up from 561 * 50 = 28,050 weights between the input variables and the first layer of hidden neurons, 50 * 6 = 300 weights between the hidden neurons and the outcome (6 because there are different levels of the outcome), 50 biases for the hidden neurons, and 6 biases for the outcome.

The output also shows the number of layers and the number of units in each layer, the type of each unit, the dropout percentage, and other regularization and hyperparameter information:

mt1
Model Details:
==============

H2OMultinomialModel: deeplearning
Model ID:  DeepLearning_model_R_1451894068318_16 
Status of Neuron Layers: predicting Outcome, 6-class classification, multinomial distribution, CrossEntropy loss, 28,406 weights/biases, 406.9 KB, 73,520 training samples, mini-batch size 1
  layer units             type dropout       l1       l2 mean_rate
1     1   561            Input 20.00 %                            
2     2    50 RectifierDropout 50.00 % 0.000000 0.000000  0.001891
3     3     6          Softmax         0.000000 0.000000  0.004912
  rate_RMS momentum mean_weight weight_RMS mean_bias bias_RMS
1                                                            
2 0.002408 0.000000    0.000172   0.062088  0.347545 0.114483
3 0.015856 0.000000   -0.009241   0.755695 -0.029887 0.294392

The next set of output reports performance metrics on the training data, including the mean squared error (lower is better), R2 (higher is better), and the log loss (lower is better):

H2OMultinomialMetrics: deeplearning
** Reported on training data. **
Description: Metrics reported on temporary (load-balanced) training frame

Training Set Metrics: 
=====================
Metrics reported on temporary (load-balanced) training frame 

MSE: (Extract with `h2o.mse`) 0.023
R^2: (Extract with `h2o.r2`) 0.99
Logloss: (Extract with `h2o.logloss`) 0.082

Finally, a confusion matrix is printed, which shows the actual outcome against the predicted outcome. The observed outcome is shown on the rows, and the predicted outcome is shown on the columns. The diagonal indicates correct classification, and the error rate by outcome level is shown:

Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=====================================================================
         X1   X2  X3   X4   X5   X6  Error        Rate
1      1216   10   0    0    0    0 0.0082  10 / 1,226
2         3 1070   0    0    0    0 0.0028   3 / 1,073
3         2   11 973    0    0    0 0.0132    13 / 986
4         0    1   0 1236   40    9 0.0389  50 / 1,286
5         0    0   0  146 1228    0 0.1063 146 / 1,374
6         0    0   0    0    0 1407 0.0000   0 / 1,407
Totals 1221 1092 973 1382 1268 1416 0.0302 222 / 7,352

Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=====================================================================
Top-6 Hit Ratios: 
  k hit_ratio
1 1  0.969804
2 2  0.999728
3 3  1.000000
4 4  1.000000
5 5  1.000000
6 6  1.000000

We can extract and look at the features of the model using the h2o.deepfeatures() function, specifying the model, data, and layer we want to extract. The following code extracts features and looks at the first few rows. The outcome is also included by default. Note the zeros in the features; these are there because we used a linear rectifier, so values below zero are censored at zero:

f <- as.data.frame(h2o.deepfeatures(mt1, h2oactivity.train, 1))
f[1:10, 1:5]

   Outcome DF.L1.C1 DF.L1.C2 DF.L1.C3 DF.L1.C4
1        5     0.00      5.9    0.091      2.1
2        5     0.00      4.7    0.000      1.7
3        5     0.00      4.4    0.102      1.5
4        5     0.00      4.9    0.000      1.9
5        5     0.00      5.0    0.000      1.8
6        5     0.00      4.9    0.000      2.0
7        5     0.00      4.9    0.000      1.6
8        5     0.00      4.6    0.000      1.8
9        5     0.00      5.0    0.000      1.6
10       5     0.13      5.1    0.000      1.3

Just as we extracted the features, we can extract weights from each layer. The following code extracts weights and makes a heatmap so we can see if there are any clear patterns of certain input variables having higher weights to particular hidden neurons:

w1 <- as.matrix(h2o.weights(mt1, 1))

## plot heatmap of the weights
tmp <- as.data.frame(t(w1))
tmp$Row <- 1:nrow(tmp)
tmp <- melt(tmp, id.vars = c("Row"))

p.heat <- ggplot(tmp,
       aes(variable, Row, fill = value)) +
  geom_tile() +
  scale_fill_gradientn(colours = c("black", "white", "blue")) +
  theme_classic() +
  theme(axis.text = element_blank()) +
  xlab("Hidden Neuron") +
  ylab("Input Variable") +
  ggtitle("Heatmap of Weights for Layer 1")
print(p.heat)

There does not seem to be any particularly clear pattern to the effect that particular neurons are made up predominantly of a few inputs as seen in the Figure 5.3:

Training and predicting new data from a deep neural network

Figure 5.3

For all their complexity, once they are trained feedforward neural networks are straightforward to score and to use to generate predictions on data. There are built-in functions to do this, but to get a better understanding of the model we will work through one example manually.

As noted earlier, feedforward networks are constructed by layering functions together. We already extracted the weights for the first layer. However, in order to construct the neurons for hidden layer 1, we will also need the input data and the biases. Because we need to add the same constant term to an entire column to construct the deep features (even though the biases are stored as a vector with one bias for each hidden neuron), we replicate the biases and convert them into a matrix with dimensions matching the input data:

## input data
d <- as.matrix(use.train[, -562])

## biases for hidden layer 1 neurons
b1 <- as.matrix(h2o.biases(mt1, 1))
b12 <- do.call(rbind, rep(list(t(b1)), nrow(d)))

Now we can construct the features for layer 1, the hidden neurons. First, we need to standardize each column of the input data, which we can do by applying the scale() function in R to the data by columns (the second dimension of a matrix):

d.scaled <- apply(d, 2, scale)

Next we post multiply the scaled data by the weights we extracted earlier, and then add the bias matrix.

d.weighted <- d.scaled %*% t(w1) + b12

Because we included dropout on the hidden layer, we need to apply a correction. This is just a multiplicative correction based on the proportion of hidden units that are included at any iteration—that is: 1 – dropout proportion:

d.weighted <- d.weighted * (1 - .5)

Finally, for each column, we only want to take values that are zero or higher, because we used a linear rectifier. We accomplish this in R by applying the pmax() function to the weighted data by columns:

d.weighted.rectifier <- apply(d.weighted, 2, pmax, 0)

We can check whether our work was correct by comparing it to the features extracted by H2O. We use the all.equal() function for comparison with some tolerance for slight numerical differences due to floating point arithmetic:

all.equal(
  as.numeric(f[, 2]),
  d.weighted.rectifier[, 1],
  check.attributes = FALSE,
  use.names = FALSE,
  tolerance = 1e-04)

In a similar fashion, we can extract the weights and biases for the next layer, which is the output layer. We create the predicted outcome just like we created the predicted hidden neurons, by multiplying by the weights and adding the biases. However, these operations are not applied to the raw data, but rather to the features we constructed in the first stage. As before, we need to expand the biases to the appropriate dimensions:

w2 <- as.matrix(h2o.weights(mt1, 2))

b2 <- as.matrix(h2o.biases(mt1, 2))
b22 <- do.call(rbind, rep(list(t(b2)), nrow(d)))

yhat <- d.weighted.rectifier %*% t(w2) + b22

To construct the hidden neurons, we used a linear rectifier activation function. For the outputs, a softmax function is used, which normalizes all the predictions to be within [0, 1] and ensures that they sum to one, like a predicted probability. We know to use the softmax function both because it is common and because, earlier in the model output, H2O indicated that softmax was the function linking to the output layer. The softmax function is defined for each case, and is the exponentiated predictions divided by the sum of the exponentiated predictions for that case:

yhat <- exp(yhat)
normalizer <- do.call(cbind, rep(list(rowSums(yhat)), ncol(yhat)))
yhat <- yhat / normalizer

Finally, we can derive a predicted classification by choosing the output column with the highest predicted probability, using the which.max() function, and append this to our prediction dataset:

yhat <- cbind(Outcome = apply(yhat, 1, which.max), yhat)

Via the h2o.predict() function, we can also extract predictions using the built-in function, and we can compare these with the predictions we generated manually:

yhat.h2o <- as.data.frame(h2o.predict(mt1, newdata = h2oactivity.train))

xtabs(~ yhat[, 1] + yhat.h2o[, 1])

         yhat.h2o[, 1]
yhat[, 1]    1    2    3    4    5    6
        1 1216    0    0    0    0    0
        2    0 1122    0    0    0    0
        3    0    0  948    0    0    0
        4    0    0    0 1316    0    0
        5    0    0    0    0 1344    0
        6    0    0    0    0    0 1406

Our manual process matches that of H2O exactly. Of course, in practice one would not re-implement the prediction function manually, and the code that demonstrates doing it manually is not particularly computationally efficient. However, working through examples like this can help to clarify exactly what pieces go into the model and how they are used. If we had many hidden layers of neurons, the process would be very similar, just repeating the steps to generate features for each layer, and always building on top of the results from the previous layer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.154.139