Implementing neural networks

We herein use sigmoid as the activation function as an example. We first need to define the sigmoid function and its derivative function:

>>> def sigmoid(z):
...     return 1.0 / (1 + np.exp(-z))
>>> def sigmoid_derivative(z):
...     return sigmoid(z) * (1.0 - sigmoid(z))

You can derive the derivative yourselves if you want to verify it.

We then define the training function, which takes in the training dataset, the number of units in the hidden layer (we only use one hidden layer as an example), and the number of iterations:

>>> def train(X, y, n_hidden, learning_rate, n_iter):
...     m, n_input = X.shape
...     W1 = np.random.randn(n_input, n_hidden)
...     b1 = np.zeros((1, n_hidden))
...     W2 = np.random.randn(n_hidden, 1)
...     b2 = np.zeros((1, 1))
...     for i in range(1, n_iter+1):
...         Z2 = np.matmul(X, W1) + b1
...         A2 = sigmoid(Z2)
...         Z3 = np.matmul(A2, W2) + b2
...         A3 = Z3
...         dZ3 = A3 - y
...         dW2 = np.matmul(A2.T, dZ3)
...         db2 = np.sum(dZ3, axis=0, keepdims=True)
...         dZ2 = np.matmul(dZ3, W2.T) * sigmoid_derivative(Z2)
...         dW1 = np.matmul(X.T, dZ2)
...         db1 = np.sum(dZ2, axis=0)
...         W2 = W2 - learning_rate * dW2 / m
...         b2 = b2 - learning_rate * db2 / m
...         W1 = W1 - learning_rate * dW1 / m
...         b1 = b1 - learning_rate * db1 / m
...         if i % 100 == 0:
...             cost = np.mean((y - A3) ** 2)
...             print('Iteration %i, training loss: %f' % (i, cost))
...     model = {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
...     return model

Note besides weights W, we also employ bias b. Before training, we first randomly initialize weights and biases. In each iteration, we feed all layers of the network with the latest weights and biases, then calculate the gradients using the backpropagation algorithm, and finally update the weights and biases with the resulting gradients. For training performance inspection, we print out the loss and the MSE for every 100 iterations.

Again, we use Boston house prices as the toy dataset. As a reminder, data normalization is usually recommended whenever gradient descent is used. Hence, we standardize the input data by removing the mean and scaling to unit variance:

>>> boston = datasets.load_boston()
>>> num_test = 10 # the last 10 samples as testing set
>>> from sklearn import preprocessing
>>> scaler = preprocessing.StandardScaler()
>>> X_train = boston.data[:-num_test, :]
>>> X_train = scaler.fit_transform(X_train)
>>> y_train = boston.target[:-num_test].reshape(-1, 1)
>>> X_test = boston.data[-num_test:, :]
>>> X_test = scaler.transform(X_test)
>>> y_test = boston.target[-num_test:]

With the scaled dataset, we can now train a one-layer neural network with 20 hidden units, a 0.1 learning rate, and 2000 iterations:

>>> n_hidden = 20
>>> learning_rate = 0.1
>>> n_iter = 2000
>>> model = train(X_train, y_train, n_hidden, learning_rate, n_iter)
Iteration 100, training loss: 13.500649
Iteration 200, training loss: 9.721267
Iteration 300, training loss: 8.309366
Iteration 400, training loss: 7.417523
Iteration 500, training loss: 6.720618
Iteration 600, training loss: 6.172355
Iteration 700, training loss: 5.748484
Iteration 800, training loss: 5.397459
Iteration 900, training loss: 5.069072
Iteration 1000, training loss: 4.787303
Iteration 1100, training loss: 4.544623
Iteration 1200, training loss: 4.330923
Iteration 1300, training loss: 4.141120
Iteration 1400, training loss: 3.970357
Iteration 1500, training loss: 3.814482
Iteration 1600, training loss: 3.673037
Iteration 1700, training loss: 3.547397
Iteration 1800, training loss: 3.437391
Iteration 1900, training loss: 3.341110
Iteration 2000, training loss: 3.255750

Then, we define a prediction function, which takes in a model and produces regression results:

>>> def predict(x, model):
...     W1 = model['W1']
...     b1 = model['b1']
...     W2 = model['W2']
...     b2 = model['b2']
...     A2 = sigmoid(np.matmul(x, W1) + b1)
...     A3 = np.matmul(A2, W2) + b2
...     return A3

Finally, we apply the trained model on the testing set:

>>> predictions = predict(X_test, model)
>>> print(predictions)
[[16.28103034]
 [19.98591039]
 [22.17811179]
 [19.37515137]
 [20.5675095 ]
 [24.90457042]
 [22.92777643]
 [26.03651277]
 [25.35493394]
 [23.38112184]]
>>> print(y_test)
[19.7 18.3 21.2 17.5 16.8 22.4 20.6 23.9 22. 11.9]

After successfully building a neural network model from scratch, we move on with the implementation with scikit-learn. We utilize the MLPRegressor class (MLP stands for multi-layer perceptron, a nickname of neural networks):

>>> from sklearn.neural_network import MLPRegressor
>>> nn_scikit = MLPRegressor(hidden_layer_sizes=(20, 8), 
...                         activation='logistic', solver='lbfgs',
...                         learning_rate_init=0.1, random_state=42, 
...                         max_iter=2000)

The hidden_layer_sizes hyperparameter represents the number(s) of hidden neurons. In our previous example, the network contains two hidden layers with 20 and 8 nodes respectively.

We fit the neural network model on the training set and predict on the testing data:

>>> nn_scikit.fit(X_train, y_train)
>>> predictions = nn_scikit.predict(X_test)
>>> print(predictions)
[14.73064216 19.77077071 19.77422245 18.95256283 19.73320899 24.15010593 19.78909311 28.36477319 24.17612634 19.80954273]

Neural networks are often implemented with TensorFlow, which is one of the most popular deep learning (multilayer neural network) frameworks.

First, we specify parameters of the model, including two hidden layers with 20 and 8 nodes respectively, 2000 iterations, and a 0.1 learning rate:

>>> n_features = int(X_train.shape[1])
>>> n_hidden_1 = 20
>>> n_hidden_2 = 8
>>> learning_rate = 0.1
>>> n_iter = 2000

Then, we define placeholders and construct the network from input to hidden layers to output:

>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.float32, shape=[None, 1])
>>> layer_1 = tf.nn.sigmoid(tf.layers.dense(x, n_hidden_1))
>>> layer_2 = tf.nn.sigmoid(tf.layers.dense(layer_1, n_hidden_2))
>>> pred = tf.layers.dense(layer_2, 1)

After assembling the components for the model, we define the loss function, the MSE, and a gradient descent optimizer that searches for the best coefficients by minimizing the loss:

>>> cost = tf.losses.mean_squared_error(labels=y, predictions=pred)
>>> optimizer = 
 tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

Now we can initialize the variables and start a TensorFlow session:

>>> init_vars = tf.initialize_all_variables()
>>> sess = tf.Session()
>>> sess.run(init_vars)

Finally, we start the training process and print out the loss after every 100 iterations:

>>> for i in range(1, n_iter+1):
...     _, c = sess.run([optimizer, cost], 
                       feed_dict={x: X_train, y: y_train})
...     if i % 100 == 0:
...         print('Iteration %i, training loss: %f' % (i, c))
Iteration 100, training loss: 12.995015
Iteration 200, training loss: 8.587905
Iteration 300, training loss: 6.319847
Iteration 400, training loss: 5.524787
Iteration 500, training loss: 5.200356
Iteration 600, training loss: 4.217351
Iteration 700, training loss: 4.070641
Iteration 800, training loss: 3.825407
Iteration 900, training loss: 3.301410
Iteration 1000, training loss: 3.124229
Iteration 1100, training loss: 3.220546
Iteration 1200, training loss: 2.895406
Iteration 1300, training loss: 2.680367
Iteration 1400, training loss: 2.504926
Iteration 1500, training loss: 2.362953
Iteration 1600, training loss: 2.257992
Iteration 1700, training loss: 2.154428
Iteration 1800, training loss: 2.170816
Iteration 1900, training loss: 2.052284
Iteration 2000, training loss: 1.971042

We apply the trained model on the testing set:

>>> predictions = sess.run(pred, feed_dict={x: X_test})
>>> print(predictions)
[[16.431433]
 [17.861343]
 [20.286907]
 [17.6935 ]
 [18.380125]
 [22.405527]
 [19.216259]
 [24.333553]
 [23.02146 ]
 [18.86538 ]]

A bonus section is its implementation in Keras (https://keras.io/), another popular package for neural networks. Keras is a high-level API written on top of TensorFlow and two other deep learning frameworks. It was developed for fast prototyping and experimenting neural network models. We can install Keras using PyPI:

pip install keras

We import the necessary modules after installation as follows:

>>> from keras import models
>>> from keras import layers

Then, we initialize a Sequential model of Keras:

>>> model = models.Sequential()

We add layer by layer, from the first hidden layer (20 units), to the second hidden layer (8 units), then the output layer:

>>> model.add(layers.Dense(n_hidden_1, activation="sigmoid", 
                          input_shape=(n_features, )))
>>> model.add(layers.Dense(n_hidden_2, activation="sigmoid"))
>>> model.add(layers.Dense(1))

It's quite similar to building LEGO. We also need an optimizer, which we define as follows with a 0.01 learning rate:

>>> from keras import optimizers
>>> sgd = optimizers.SGD(lr=0.01)

Now we can compile the model by specifying the loss function and optimizer:

>>> model.compile(loss='mean_squared_error', optimizer=sgd)

Finally, we fit the model on the training set, with 100 iterations, and validate the performance on the testing set:

>>> model.fit(
...     X_train, y_train,
...     epochs=100,
...     validation_data=(X_test, y_test)
... )
Train on 496 samples, validate on 10 samples
Epoch 1/100
496/496 [==============================] - 0s 356us/step - loss: 255.7313 - val_loss: 10.7765
Epoch 2/100
496/496 [==============================] - 0s 24us/step - loss: 83.0557 - val_loss: 21.5385
Epoch 3/100
496/496 [==============================] - 0s 25us/step - loss: 70.7806 - val_loss: 22.5854
Epoch 4/100
496/496 [==============================] - 0s 24us/step - loss: 58.7843 - val_loss: 25.0963
Epoch 5/100
496/496 [==============================] - 0s 27us/step - loss: 51.1305 - val_loss: 20.6070
……
……
Epoch 96/100
496/496 [==============================] - 0s 21us/step - loss: 6.4766 - val_loss: 18.2094
Epoch 97/100
496/496 [==============================] - 0s 21us/step - loss: 6.2356 - val_loss: 13.1832
Epoch 98/100
496/496 [==============================] - 0s 21us/step - loss: 6.0728 - val_loss: 13.2538
Epoch 99/100
496/496 [==============================] - 0s 21us/step - loss: 6.0512 - val_loss: 14.1940
Epoch 100/100
496/496 [==============================] - 0s 23us/step - loss: 6.2514 - val_loss: 13.1176

In each iteration, the training loss and validation loss are displayed.

As usually, we obtain the prediction of the testing set using the trained model:

>>> predictions = model.predict(X_test)
>>> print(predictions)
[[16.521835]
 [18.425688]
 [19.65961 ]
 [19.23118 ]
 [18.676624]
 [21.917233]
 [21.794016]
 [25.537102]
 [24.175468]
 [22.05365 ]]

Table of Contents for Implementing neural networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing neural networks