Summary

In this chapter, we saw neural networks as a nonlinear method capable of solving both regression and classification problems. Motivated by the biological analogy to human neurons, we first introduced the simplest neural network, the perceptron. This is able to solve binary classification problems only when the two classes are linearly separable, something that we very rarely rely upon in practice.

By changing the function that transforms the linear weighted combination of inputs, namely the activation function, we discovered how to create different types of individual neurons. A linear activation function creates a neuron that performs linear regression, whereas the logistic activation function creates a neuron that performs logistic regression. By organizing and connecting neurons into layers, we can create multilayer neural networks that are powerful models for solving nonlinear problems.

The idea behind having hidden layers of neurons is that each hidden layer learns a new set of features from its inputs. As the most common type of multilayer neural network, we introduced the multilayer perceptron and saw that it can naturally learn multiple outputs with the same network. In addition, we experimented on real-world data sets for both regression and classification tasks, including a multiclass classification problem that we saw is also handled naturally. R has a number of packages for implementing neural networks, including neuralnet, nnet, and RSNNS, and we experimented with each of these in turn. Each has its respective advantages and disadvantages and there isn't a clear winner for every circumstance.

An important benefit of working with neural networks is that they can be very powerful in solving highly complex nonlinear problems of regression and classification alike without making any significant assumptions about the relationships between the input features. On the other hand, neural networks can often be quite tricky to train. Scaling input features is important. It is also important to be aware of the various parameters affecting the convergence of the model, such as the learning rate and the error gradient tolerance. Another crucial decision to make is the number and distribution of hidden layer neurons. As the complexity of the network, the number of input features, or the size of the training data increases, the training time often becomes quite long compared to other supervised learning methods.

We also saw in our regression example that because of the flexibility and power of neural networks, they can be prone to overfitting the data, thus overestimating the model's accuracy. Regularization approaches, such as weight decay, exist to mitigate this problem to a certain extent. Finally, one clear disadvantage that deserves mention is that the neural weights have no direct interpretation, unlike regression coefficients, and even though the neural network topology may learn features, these are difficult to explain or interpret.

Our next chapter continues our foray into the world of supervised learning and presents support vector machines, our third nonlinear modeling tool, which is primarily used for dealing with classification problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.212.124