Predicting the energy efficiency of buildings

In this section, we will investigate how neural networks can be used to solve a real-world regression problem. Once again, we turn to the UCI Machine Learning Repository for our data set. We've chosen to try out the energy efficiency data set available at http://archive.ics.uci.edu/ml/datasets/Energy+efficiency. The prediction task is to use various building characteristics, such as surface area and roof area, in order to predict the energy efficiency of a building, which is expressed in the form of two different metrics—heating load and cooling load.

This is a good example for us to try out as we can demonstrate how neural networks can be used to predict two different outputs with a single network. The full attribute description of the data set is given in the following table:

Column name

Type

Definition

relCompactness

Numerical

Relative compactness

surfArea

Numerical

Surface area

wallArea

Numerical

Wall area

roofArea

Numerical

Roof area

height

Numerical

Overall height

orientation

Numerical

Building orientation (factor)

glazArea

Numerical

Glazing area

glazAreaDist

Numerical

Glazing area distribution (factor)

heatLoad

Numerical

Heating load (first output)

coolLoad

Numerical

Cooling load (second output)

The data was generated using a simulator called Ecotect. Each observation in the data set corresponds to a simulated building. All the buildings have the same volume, but other attributes that impact their energy efficiency, such as their glazing area, are modified.

Note

This data set is described in the paper Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools, Athanasios Tsanas and Angeliki Xifara, published in Energy and Buildings, Vol. 49, in 2012.

The data on the website comes in Microsoft Excel format. To load this into R, we can use the R package xlsx, which can read and understand Microsoft Excel files:

> library(xlsx)
> eneff <- read.xlsx2("ENB2012_data.xlsx", sheetIndex = 1, 
                      colClasses = rep("numeric", 10))
> names(eneff) <- c("relCompactness", "surfArea", "wallArea", "roofArea", "height", "orientation", "glazArea", "glazAreaDist", "heatLoad", "coolLoad")
> eneff <- eneff[complete.cases(eneff),]

The import adds a number of empty observations at the end of the data frame, so the last line removes these. Now, by referring to the paper in which the data set was presented, we discover that two of our attributes are actually factors. In order for our neural network to work with these, we will need to convert them into dummy variables. To do this, we will use the dummyVars() function from the caret package:

> library(caret)
> eneff$orientation <- factor(eneff$orientation)
> eneff$glazAreaDist <- factor(eneff$glazAreaDist)
> dummies <- dummyVars(heatLoad + coolLoad ~ ., data = eneff)
> eneff_data <- cbind(as.data.frame(predict(dummies, newdata =  
                                            eneff)), eneff[,9:10])
> dim(eneff_data)
[1] 768  18

The dummyVars() function takes in a formula and a data frame. From these, it identifies the input features and performs dummy encoding on those that are factors in order to produce new binary columns. There are as many columns created for a factor as there are levels in that factor. Just as with the preProcess() function that we've been using, we actually obtain the columns themselves after using the predict() function. Next, we'll do an 80-20 split between the training and test data:

> set.seed(474576)
> eneff_sampling_vector <- createDataPartition(eneff_data$heatLoad, p 
  = 0.80, list = FALSE)
> eneff_train <- eneff_data[eneff_sampling_vector, 1:16]
> eneff_train_outputs <- eneff_data[eneff_sampling_vector, 17:18]
> eneff_test <- eneff_data[-eneff_sampling_vector, 1:16]
> eneff_test_outputs <- eneff_data[-eneff_sampling_vector, 17:18]

One of the most important preprocessing steps to perform when training neural networks is to scale input features and outputs. One good reason to perform input scaling is in order to avoid saturation which occurs when the optimization procedure reaches a point where the gradient of the error function is very small in absolute value. This is usually the result of very large or very small inputs to the nonlinear neuron activation functions. Saturation causes the optimization procedure to terminate, thinking we have converged.

Depending on the particular neural network implementation, for regression tasks it may also make sense to scale the outputs as some implementations of linear neurons are designed to produce an output in the interval [-1,1]. Scaling can also help convergence. Consequently, we will use caret to scale all our data dimensions to the unit interval, noting that this has no effect on the binary columns produced earlier:

> eneff_pp <- preProcess(eneff_train, method = c("range"))
> eneff_train_pp <- predict(eneff_pp, eneff_train)
> eneff_test_pp <- predict(eneff_pp, eneff_test)

> eneff_train_out_pp <- preProcess(eneff_train_outputs, method = 
                        c("range"))
> eneff_train_outputs_pp <- 
  predict(eneff_train_out_pp, eneff_train_outputs)
> eneff_test_outputs_pp <- 
  predict(eneff_train_out_pp, eneff_test_outputs)

Several different packages implement neural networks in R, each with their various merits and strengths. For this reason, it helps to be familiar with more than one package and in this chapter we will investigate three of these, the first being neuralnet:

> library("neuralnet")
> n <- names(eneff_data)
> f <- as.formula(paste("heatLoad + coolLoad ~", paste(n[!n %in% 
                  c("heatLoad", "coolLoad")], collapse = " + ")))
> eneff_model <- neuralnet(f, 
  data = cbind(eneff_train_pp, eneff_train_outputs_pp), hidden = 10)
> eneff_model
Call: neuralnet(formula = f, data = cbind(eneff_train_pp, eneff_train_outputs_pp),     hidden = 10)

1 repetition was calculated.

         Error Reached Threshold Steps
1 0.3339635783    0.009307995429  9998

The neuralnet() function trains a neural network based on the information provided in its arguments. The first argument we provide is a formula and the format is similar to the formulae that we've seen with the lm() and glm() functions in previous chapters. One interesting difference here is that we have specified two outputs, heatLoad and coolLoad. Another difference is that currently we are unable to use the dot (.) notation to imply that all the remaining columns in our data frame can be used as features, so we need to specify them explicitly.

Note that with the formula, we have effectively defined the input and output layers of the neural network and so what remains to be specified is the structure of the hidden layers. This is specified with the hidden parameter, which either takes in a scalar for a single layer, or a vector of scalars that specify the number of hidden units in each layer, starting from the layer just after the input layer, and ending with the layer just before the output layer.

In the example we saw earlier, we've used a single layer with 10 nodes. We can actually visualize our neural network as the package provides us with the ability to plot the model directly (the numbered circles are dummy bias neurons):

Predicting the energy efficiency of buildings

The call to neuralnet() also allows us to specify what type of activation function we would like to use for our neurons through the parameter string act.fct. By default, this is set to the logistic activation function and so we have not changed this. Another very important parameter is linear.output, which can be either TRUE or FALSE. This specifies whether we should apply the activation function to the neurons in the output layer. The default value of TRUE that we used means that we do not apply the activation function and so we can observe a linear output. For regression type problems, this is what is appropriate. This is because, if we were to apply a logistic activation function our output would be bounded in the interval [0,1]. Finally, we can specify a differentiable error function through the err.fct parameter to use as part of our optimization strategy. As we are doing regression, we use the default value of sse, which corresponds to the sum of squared error.

As there is a random component in neural network training, namely the initialization of the weights, we may want to specify that we should retrain the same model a number of times in order for us to pick the best possible model that we get (using criteria such as the SSE to rank these). This can be done by specifying an integer value for the rep parameter. Let's rewrite our original call to explicitly show the default values we are using:

> eneff_model <- neuralnet(f, 
  data = cbind(eneff_train_pp, eneff_train_outputs_pp), hidden = 10, act.fct = "logistic", linear.output = TRUE, err.fct = "sse", rep = 1)

The model's output provides us with some information about the performance of the neural network and what is shown depends on its configuration. As we have specified the SSE as our error metric, the error shown is the SSE that was obtained. The threshold figure is just the value of the partial derivative of the error function when the model stopped training. Essentially, instead of terminating when the gradient is 0 exactly, we specify a very small value below which the error gradient needs to fall before the algorithm terminates. The default value for this is 0.01 and can be changed by supplying a number for the threshold parameter in the neuralnet() function. Reducing this value will generally result in longer training times. The model output also shows us the number of training steps that were performed. Finally, if we had used the rep parameter to repeat this process multiple times, we would see a row for each model trained. Our output shows us that we trained only one model.

Tip

As the neural network contains a random component in the form of the initialization of weight vectors, reproducing our code will likely not give the exact same results. If, when running the examples, R outputs a message that the model has not converged, try running the code again.

Evaluating multilayer perceptrons for regression

The package neuralnet provides us with a neat way to use our model to perform predictions through the compute() function. Essentially, it provides us with not only the predicted output for a data frame of observations, but also shows us the output values of all the neurons in the model's architecture. To evaluate the performance of the model, we are interested in the outputs of the neural network on our test set:

> test_predictions <- compute(eneff_model, eneff_test_pp)

We can access the predicted outputs of the neural network using the net.result attribute of the test_predictions object as follows:

> head(test_predictions$net.result)
            [,1]          [,2]
7  0.38996108769 0.39770348145
8  0.38508402576 0.46726904682
14 0.29555228848 0.24157156896
21 0.49912349400 0.51244876337
23 0.50036257800 0.47436990729
29 0.01133684342 0.01815294595

As this is a regression problem, we would like to be able to use the MSE in order to evaluate the performance of our model on both target outputs. In order to do that, we need to transform our predicted outputs back onto their original scale for a fair assessment to be made. The scaling constants we used on our data are stored in the ranges attribute of the eneff_train_out_pp object:

> eneff_train_out_pp$ranges
     heatLoad coolLoad
[1,]     6.01    10.90
[2,]    42.96    48.03

The first row contains the minimum values of the original data, and the second row contains the maximum values. We'll now write a function that will take in a scaled vector and another vector that contains the original minimum and maximum values, and will return the original unscaled vector:

reverse_range_scale <- function(v, ranges) {
     return( (ranges[2] - ranges[1]) * v + ranges[1] )
 }

Next, we'll use this to obtain the unscaled predicted outputs for our test set:

> output_ranges <- eneff_train_out_pp$ranges
> test_predictions_unscaled <- sapply(1:2, function(x) 
  reverse_range_scale(test_predictions[,x], output_ranges[,x]))

We can also define a simple function to compute the MSE and use it to check the performance on our two tasks:

mse <- function(y_p, y) {
  return(mean((y - y_p) ^ 2))
}

> mse(test_predictions_unscaled[,1], eneff_test_outputs[,1])
[1] 0.2940468477
> mse(test_predictions_unscaled[,2], eneff_test_outputs[,2])
[1] 1.440127075

These values are very low, indicating that we have very good prediction accuracy. We can also investigate correlation, which is scale independent, and we could have used it on the unscaled outputs as well:

> cor(test_predictions_unscaled[,1], eneff_test_outputs[,1])
[1] 0.9986655316
> cor(test_predictions_unscaled[,2], eneff_test_outputs[,2])
[1] 0.9926735348

These values are extremely high, indicating that we have near-perfect performance, something very rare to see with real-world data. If the accuracy were not this high, we would experiment by making the architecture more complicated. We could, for example, build a model with an additional layer by setting hidden=c(10,5) so that we would have an additional layer of five neurons before the output layer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.98.14