Use case – build and apply a neural network

To close out the chapter, we will discuss a more realistic use case for neural networks. We will use a public dataset by Anguita, D., Ghio, A., Oneto, L., Parra, X., and Reyes-Ortiz, J. L. (2013) that uses smartphones to track physical activity. The data can be downloaded here: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones. The smartphones had an accelerometer and gyroscope from which 561 features from both time and frequency were used.

The smartphones were worn during walking, walking upstairs, walking downstairs, standing, sitting, and lying down. Although this data came from phones, similar measures could be derived from other devices designed to track activity such as various fitness tracking watches or bands. So this data can be useful if we want to sell devices and have them automatically track how many of these different activities the wearer engages in.

This data has been normalized to range from -1 to +1; however, usually we might want to perform some normalization. After downloading the data, the files can be unzipped and we can then locate them in the working directory or modify the paths in the following code to point to the correct location. We can read in the training and testing data, as well as the labels, and to recap take a quick look at the distribution of the outcome (Figure 2.7):

use.train.x <- read.table("UCI HAR Dataset/train/X_train.txt")
use.train.y <- read.table("UCI HAR Dataset/train/y_train.txt")[[1]]

use.test.x <- read.table("UCI HAR Dataset/test/X_test.txt")
use.test.y <- read.table("UCI HAR Dataset/test/y_test.txt")[[1]]

use.labels <- read.table("UCI HAR Dataset/activity_labels.txt")

barplot(table(use.train.y))
Use case – build and apply a neural network

Figure 2.7

We are going to evaluate a variety of tuning parameters to show how we might experiment with different approaches to try to get the best possible model. Because the models can take some time to train and as currently shown only use a single core, we can evaluate the models using different tuning parameters simultaneously using parallel processing. First, we need to add some additional packages to our checkpoint.R file and re-run that:

## Chapter 2 ##
library(parallel)
library(foreach)
library(doSNOW)

Now we can pick our tuning parameters and set up a local cluster as the backend for the foreach R package for parallel for loops. Note that, if you do this on a machine with fewer than five cores, you should change makeCluster(5) to a lower number:

## choose tuning parameters
tuning <- list(
  size = c(40, 20, 20, 50, 50),
  maxit = c(60, 100, 100, 100, 100),
  shuffle = c(FALSE, FALSE, TRUE, FALSE, FALSE),
  params = list(FALSE, FALSE, FALSE, FALSE, c(0.1, 20, 3)))

## setup cluster using 5 cores
## load packages, export required data and variables
## and register as a backend for use with the foreach package
cl <- makeCluster(5)
clusterEvalQ(cl, {
  library(RSNNS)
})
clusterExport(cl,
  c("tuning", "use.train.x", "use.train.y",
    "use.test.x", "use.test.y")
  )
registerDoSNOW(cl)

Now we are ready to train all the models. The following code shows a parallel for loop, using code that is similar to what we have already seen, but this time setting some of the arguments based on the tuning parameters we previously stored in the list:

use.models <- foreach(i = 1:5, .combine = 'c') %dopar% {
  if (tuning$params[[i]][1]) {
    set.seed(1234)
    list(Model = mlp(
      as.matrix(use.train.x),
      decodeClassLabels(use.train.y),
      size = tuning$size[[i]],
      learnFunc = "Rprop",
      shufflePatterns = tuning$shuffle[[i]],
      learnFuncParams = tuning$params[[i]],
      maxit = tuning$maxit[[i]]
      ))
  } else {
    set.seed(1234)
    list(Model = mlp(
      as.matrix(use.train.x),
      decodeClassLabels(use.train.y),
      size = tuning$size[[i]],
      learnFunc = "Rprop",
      shufflePatterns = tuning$shuffle[[i]],
      maxit = tuning$maxit[[i]]
    ))
  }
}

Because generating out-of-sample predictions can also take some time, we will do that in parallel as well. However, first we need to export the model results to each of the workers on our cluster, and then we can calculate the predictions:

clusterExport(cl, "use.models")
use.yhat <- foreach(i = 1:5, .combine = 'c') %dopar% {
  list(list(
    Insample = encodeClassLabels(fitted.values(use.models[[i]])),
    Outsample = encodeClassLabels(predict(use.models[[i]],
                                          newdata = as.matrix(use.test.x)))
    ))
}

Finally, we can merge the actual and fitted or predicted values together into a dataset, calculate performance measures on each one, and store the overall results together for examination and comparison. We can repeat almost identical code as follows to generate out-of-sample performance measures. That code is not shown in the book, but is available in the code bundle provided with the book. Some additional data management is required here as sometimes a model may not predict each possible response level, but this can make for non-symmetrical frequency cross tabs, unless we convert the variable to a factor and specify the levels. We also drop 0 values, which indicate the model was uncertain how to classify an observation:

use.insample <- cbind(Y = use.train.y,
  do.call(cbind.data.frame, lapply(use.yhat, `[[`, "Insample")))
colnames(use.insample) <- c("Y", paste0("Yhat", 1:5))

performance.insample <- do.call(rbind, lapply(1:5, function(i) {
  f <- substitute(~ Y + x, list(x = as.name(paste0("Yhat", i))))
  use.dat <- use.insample[use.insample[,paste0("Yhat", i)] != 0, ]
  use.dat$Y <- factor(use.dat$Y, levels = 1:6)
  use.dat[, paste0("Yhat", i)] <- factor(use.dat[, paste0("Yhat", i)], levels = 1:6)
  res <- caret::confusionMatrix(xtabs(f, data = use.dat))

  cbind(Size = tuning$size[[i]],
        Maxit = tuning$maxit[[i]],
        Shuffle = tuning$shuffle[[i]],
        as.data.frame(t(res$overall[c("AccuracyNull", "Accuracy", "AccuracyLower", "AccuracyUpper")])))
}))

If we print the in-sample and out-of-sample performance, we can see how each of our models did and the effect of varying some of the tuning parameters. The output is shown in the following code. The fourth column (null accuracy) is dropped as it is not as important for this comparison. Note that the code for the out-of-sample performance is not shown in this book but is left as an exercise for the reader (an easy adaptation of the code for in-sample performance) and is provided in the code bundle:

performance.insample[,-4]

  Size Maxit Shuffle Accuracy AccuracyLower AccuracyUpper
1   40    60   FALSE     0.99          0.98          0.99
2   20   100   FALSE     0.99          0.99          0.99
3   20   100    TRUE     0.99          0.99          0.99
4   50   100   FALSE     0.99          0.99          1.00
5   50   100   FALSE     1.00          1.00          1.00
 
performance.outsample[,-4]
  Size Maxit Shuffle Accuracy AccuracyLower AccuracyUpper
1   40    60   FALSE     0.93          0.92          0.94
2   20   100   FALSE     0.92          0.91          0.93
3   20   100    TRUE     0.92          0.91          0.93
4   50   100   FALSE     0.91          0.90          0.92
5   50   100   FALSE     0.92          0.91          0.93

First of all, these results show that we are able to classify the types of activity people are engaged in quite accurately based on the data from their smartphones. It also seems from the in-sample data that the more complex models do better. However, examining the out-of-sample performance measures, the reverse is actually true! Thus, not only are the in-sample performance measures biased estimates of the models' actual out-of-sample performance, they do not even provide the best way to rank order model performance to choose the best performing model. We will get into ways to combat this overfitting in the next chapter as we prepare to go into deep neural networks where there are multiple hidden layers.

Despite the slightly worse out-of-sample performance, the models still do well—far better than chance alone—and, for our example use case, we could pick the best model (number 1) and be quite confident that using this will provide a good classification of a user's activities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.160.181