Introduction to feedforward neural networks with R

The package h2o comes from the H2O.ai company. With it, you can train several machine learning models, including feedforward neural networks. Although the commands will be passed through the R console, the heavy lift will be done using Java— the h2o package for R is actually an interface.

That said, having Java installed is a requirement. If you are sure about having Java installed, make sure the h2o package is installed too:

install.packages('h2o')

Now we need to load the h2o library and initiate the package:

library(h20)
h2o.init(nthreads=-1, max_mem_size='2G')

The h2o.init() function will initialize and connect to h2O. The nthreads=-1 argument  will demand all available CPUs' cores to be used, this makes the training much faster. With max_mem_size, we set the maximum amount of RAM memory that h2O will be allowed to use, the recommendations stand for four times the dataset size—rest assured that 2 GB will be more than enough.

Although different people may define it differently, usually neural networks with more than two hidden layers (the layers between input and output nodes) are called deep learning models.

To train our neural network model, we use the h2o.deeplearning() function. Datasets can be manipulated by h2o or R; nonetheless, they have to be loaded in h2o in order to be used both for training and prediction purposes. The function as.h2o() is able to deal with it:

time0_nn <- Sys.time()

nn <- h2o.deeplearning(x = 1:7,
y = 8,
training_frame = as.h2o(dt_Chile[-i_out,1:8]),
validation_frame = as.h2o(dt_Chile[val,1:8]),
hidden = c(6,6),
standardize = T,
activation = 'Tanh',
l2 = .0025, epochs = 50,
reproducible = T,
seed = 10)

time1_nn <- Sys.time()

There are many parameters to talk about:

  • x: Gives the indexes for the columns that will be used as input
  • y: The index for the column to be used as output (target variable)
  • training_frame: The dataset used to train the model (notice how as.h2o() was called)
  • validation_frame: The dataset used to validate the model (as.h2o() again)
  • hidden: A vector dictating the number of nodes to be used in each hidden layer (the number of hidden layers is given by its length)
  • standardize: This is a Boolean saying whether to standardize data or not
  • activation: This is a string making reference to the activation function to be used ('Tanh' goes for tangent hyperbolic)
  • l2: L2 regularization parameter, this is a technique used to avoid overfitting
  • epochs: This sets the epoch parameter for the backpropagation algorithm
  • reproducible: This is a Boolean that will demand the process to be reproducible or not
  • seed: An integer, it must always be set if reproducible = TRUE

There are many hyperparameters to think about when designing neural nets; we will discuss more in Chapter 8Neural Networks and Deep Learning. For the moment, let's see how the training has gone. Simply call plot(nn)to get the following diagram:

Figure 6.15: Training scoring history

Here we see how the scoring has gone through the training. Thanks to the validation_frame argument, we had a validation line to compare with the training one. If the validation line went up while the training one was going down, this would be a mark of overfitting.

To evaluate the test sample hit rate as we've done before, we need to extract predictions using the test sample and convert the results into an R object (such as a matrix) to only then compare the results:

nn_pred <- h2o.predict(nn, newdata = as.h2o(dt_Chile[test,1:8]))
nn_pred <- as.matrix(nn_pred)
mean(nn_pred[,1] == dt_Chile[test,'vote'])
# [1] 0.6849315
time1_nn - time0_nn
# Time difference of 2.173553 secs

The h2o.predict() function is used to gather the predictions. Again, as.h2o() had to be called with a DataFrame. The first line creates the object holding the predictions, by this point  nn_pred will be an H2Oframe type of object. Nesting it into as.matrix() will turn it back into a more R natural object. The first column of nn_pred shows a class type prediction while the other ones are probabilistic.

The performance achieved in the test sample (68,49%) was very close to the one displayed by SVMs (69,31%) using far less time; less than 3 seconds compared to a little more than 1 minute used to train the SVM model. Multiple threading might have something to do with the time it took (for me) to train both models.

To build functions from scratch aimed at training/fitting a certain model is a great way to master that model. You are not likely to program in a optimal way at first but if you do, consider sharing with the community.

Yet, there is no way to say that SVMs are absolutely better than neural nets nor that neural nets overplay SVMs. Both are cutting edge models that are likely to overcome difficult problems if you have enough data. With this, we wrap up this chapter about machine learning. In the next chapter, we will talk about predictive analytics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.82.217