We have covered in the previous section that, by adding hidden units to a neural network, we can approximate the target function more closely. However, we haven't applied it to a classification problem. To do this, we will generate data with a nonlinear target value and look at how the decision surface changes once we add hidden units to our architecture. Let's see the universal approximation theorem at work! First, let's generate some non-linearly separable data with two features, set up our neural network architectures, and see how our decision boundaries change with each architecture:
%matplotlib inline from sknn.mlp import Classifier, Layer from sklearn import preprocessing import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from itertools import product X,y= datasets.make_moons(n_samples=500, noise=.2, random_state=222) from sklearn.datasets import make_blobs net1 = Classifier( layers=[ Layer("Softmax")],random_state=222, learning_rate=0.01, n_iter=100) net2 = Classifier( layers=[ Layer("Rectifier", units=4), Layer("Softmax")],random_state=12, learning_rate=0.01, n_iter=100) net3 =Classifier( layers=[ Layer("Rectifier", units=4), Layer("Rectifier", units=4), Layer("Softmax")],random_state=22, learning_rate=0.01, n_iter=100) net4 =Classifier( layers=[ Layer("Rectifier", units=4), Layer("Rectifier", units=4), Layer("Rectifier", units=4), Layer("Rectifier", units=4), Layer("Rectifier", units=4), Layer("Rectifier", units=4), Layer("Softmax")],random_state=62, learning_rate=0.01, n_iter=100) net1.fit(X, y) net2.fit(X, y) net3.fit(X, y) net4.fit(X, y) # Plotting decision regions x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1)) f, arxxx = plt.subplots(2, 2, sharey='row',sharex='col', figsize=(8, 8)) plt.suptitle('Neural Network - Decision Boundary') for idx, clf, ti in zip(product([0, 1], [0, 1]), [net1, net2, net3,net4], ['0 hidden layer', '1 hidden layer', '2 hidden layers','6 hidden layers']): Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) arxxx[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.5) arxxx[idx[0], idx[1]].scatter(X[:, 0], X[:, 1], c=y, alpha=0.5) arxxx[idx[0], idx[1]].set_title(ti) plt.show()
In this screenshot, we can see that, as we add hidden layers to the neural network, we can learn increasingly complex decision boundaries. An interesting side note is that the network with two layers produced the most accurate predictions.
18.116.51.117