Implementing linear regression

With a thorough understanding of the gradient descent based linear regression, we'll now implement it from scratch.

We start with defining the function computing the prediction with the current weights:

>>> def compute_prediction(X, weights):
...     """ Compute the prediction y_hat based on current weights
...     Args:
...         X (numpy.ndarray)
...         weights (numpy.ndarray)
...     Returns:
...         numpy.ndarray, y_hat of X under weights
...     """
...     predictions = np.dot(X, weights)
...     return predictions

Then, we can continue with the function updating the weight w by one step in a gradient descent manner, as follows:

>>> def update_weights_gd(X_train, y_train, weights, learning_rate):
...     """ Update weights by one step
...     Args:
...         X_train, y_train (numpy.ndarray, training data set)
...         weights (numpy.ndarray)
...         learning_rate (float)
...     Returns:
...         numpy.ndarray, updated weights
...     """
...     predictions = compute_prediction(X_train, weights)
...     weights_delta = np.dot(X_train.T, y_train - predictions)
...     m = y_train.shape[0]
...     weights += learning_rate / float(m) * weights_delta
...     return weights

Then we add the function that calculates the cost J(w) as well:

>>> def compute_cost(X, y, weights):
...     """ Compute the cost J(w)
...     Args:
...         X, y (numpy.ndarray, data set)
...         weights (numpy.ndarray)
...     Returns:
...         float
...     """
...     predictions = compute_prediction(X, weights)
...     cost = np.mean((predictions - y) ** 2 / 2.0)
...     return cost

Now, put all functions together with a model training function by performing the following tasks:

Update the weight vector in each iteration
Print out the current cost for every 100 (or can be any) iterations to ensure cost is decreasing and things are on the right track

Let's see how it's done by executing the following commands:

>>> def train_linear_regression(X_train, y_train, max_iter, 
                             learning_rate, fit_intercept=False):
...     """ Train a linear regression model with gradient descent
...     Args:
...         X_train, y_train (numpy.ndarray, training data set)
...         max_iter (int, number of iterations)
...         learning_rate (float)
...         fit_intercept (bool, with an intercept w0 or not)
...     Returns:
...         numpy.ndarray, learned weights
...     """
...     if fit_intercept:
...         intercept = np.ones((X_train.shape[0], 1))
...         X_train = np.hstack((intercept, X_train))
...     weights = np.zeros(X_train.shape[1])
...     for iteration in range(max_iter):
...         weights = update_weights_gd(
                         X_train, y_train, weights, learning_rate)
...         # Check the cost for every 100 (for example) iterations
...         if iteration % 100 == 0:
...             print(compute_cost(X_train, y_train, weights))
...     return weights

Finally, predict the results of new input values using the trained model as follows:

>>> def predict(X, weights):
...     if X.shape[1] == weights.shape[0] - 1:
...         intercept = np.ones((X.shape[0], 1))
...         X = np.hstack((intercept, X))
...     return compute_prediction(X, weights)

Implementing linear regression is very similar to logistic regression as we just saw. Let's examine it with a small example:

>>> X_train = np.array([[6], [2], [3], [4], [1], 
                        [5], [2], [6], [4], [7]])
>>> y_train = np.array([5.5, 1.6, 2.2, 3.7, 0.8, 
                        5.2, 1.5, 5.3, 4.4, 6.8])

Train a linear regression model by 100 iterations, at a learning rate of 0.01 based on intercept-included weights:

>>> weights = train_linear_regression(X_train, y_train,
            max_iter=100, learning_rate=0.01, fit_intercept=True)

Check the model's performance on new samples as follows:

>>> X_test = np.array([[1.3], [3.5], [5.2], [2.8]])
>>> predictions = predict(X_test, weights)
>>> import matplotlib.pyplot as plt
>>> plt.scatter(X_train[:, 0], y_train, marker='o', c='b')
>>> plt.scatter(X_test[:, 0], predictions, marker='*', c='k')
>>> plt.xlabel('x')
>>> plt.ylabel('y')
>>> plt.show()

Refer to the following screenshot for the end result:

The model we trained correctly predicts new samples (depicted by the stars).

Let's try it on another dataset, the diabetes dataset from scikit-learn:

>>> from sklearn import datasets
>>> diabetes = datasets.load_diabetes()
>>> print(diabetes.data.shape)
(442, 10)
>>> num_test = 30 
>>> X_train = diabetes.data[:-num_test, :]
>>> y_train = diabetes.target[:-num_test]

Train a linear regression model by 5000 iterations, at a learning rate of 1 based on intercept-included weights (the cost is displayed every 500 iterations):

>>> weights = train_linear_regression(X_train, y_train, 
              max_iter=5000, learning_rate=1, fit_intercept=True)
2960.1229915
1539.55080927
1487.02495658
1480.27644342
1479.01567047
1478.57496091
1478.29639883
1478.06282572
1477.84756968
1477.64304737
>>> X_test = diabetes.data[-num_test:, :]
>>> y_test = diabetes.target[-num_test:]
>>> predictions = predict(X_test, weights)
>>> print(predictions)
[ 232.22305668 123.87481969 166.12805033 170.23901231 
  228.12868839 154.95746522 101.09058779 87.33631249 
  143.68332296 190.29353122 198.00676871 149.63039042 
   169.56066651 109.01983998 161.98477191 133.00870377 
   260.1831988 101.52551082 115.76677836 120.7338523
   219.62602446 62.21227353 136.29989073 122.27908721 
   55.14492975 191.50339388 105.685612 126.25915035 
   208.99755875 47.66517424]
>>> print(y_test)
[ 261. 113. 131. 174. 257. 55. 84. 42. 146. 212. 233. 
  91. 111. 152. 120. 67. 310. 94. 183. 66. 173. 72. 
  49. 64. 48. 178. 104. 132. 220. 57.]

The estimate is pretty close to the ground truth.

So far, we have been using gradient descent in weight optimization but, the same as logistic regression, linear regression is also open to stochastic gradient descent (SGD). To realize it, we can simply replace the update_weights_gd function with update_weights_sgd we created in Chapter 7, Predicting Online Ads Click-through with Logistic Regression.

We can also directly use the SGD-based regression algorithm, SGDRegressor, from scikit-learn:

>>> from sklearn.linear_model import SGDRegressor
>>> regressor = SGDRegressor(loss='squared_loss', penalty='l2',
  alpha=0.0001, learning_rate='constant', eta0=0.01, n_iter=1000)

Here 'squared_loss' for the loss parameter indicates the cost function is MSE; penalty is the regularization term and it can be None, l1, or l2, which is similar to SGDClassifier in Chapter 7, Predicting Online Ads Click-through with Logistic Regression, in order to reduce overfitting; n_iter is the number of iterations; and the remaining two parameters mean the learning rate is 0.01 and unchanged during the course of training. Train the model and output prediction on the testing set as follows:

>>> regressor.fit(X_train, y_train)
>>> predictions = regressor.predict(X_test)
>>> print(predictions)
[ 231.03333725 124.94418254 168.20510142 170.7056729 
  226.52019503 154.85011364 103.82492496 89.376184 
  145.69862538 190.89270871 197.0996725 151.46200981 
  170.12673917 108.50103463 164.35815989 134.10002755 
  259.29203744 103.09764563 117.6254098 122.24330421
  219.0996765 65.40121381 137.46448687 123.25363156 
  57.34965405 191.0600674 109.21594994 128.29546226 
  207.09606669 51.10475455]

Of course, we won't miss its implementation in TensorFlow. First, we import TensorFlow and specify the parameters of the model, including 1000 iterations during the training process and a 0.5 learning rate:

>>> import tensorflow as tf
>>> n_features = int(X_train.shape[1])
>>> learning_rate = 0.5
>>> n_iter = 1000

Then, we define placeholder and Variable, including the weights and bias of the model as follows:

>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.float32, shape=[None])
>>> W = tf.Variable(tf.ones([n_features, 1]))
>>> b = tf.Variable(tf.zeros([1]))

Construct the model by computing the prediction as follows:

>>> pred = tf.add(tf.matmul(x, W), b)[:, 0]

After assembling the graph for the model, we define the loss function, the MSE, and a gradient descent optimizer that searches for the best coefficients by minimizing the loss:

>>> cost = tf.losses.mean_squared_error(labels=y, predictions=pred)
>>> optimizer =
    tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

Now we can initialize the variables and start a TensorFlow session:

>>> init_vars = tf.initialize_all_variables()
>>> sess = tf.Session()
>>> sess.run(init_vars)

Finally, we start the training process and print out loss after every 100 iterations as follows:

>>> for i in range(1, n_iter+1):
...     _, c = sess.run([optimizer, cost], 
                       feed_dict={x: X_train, y: y_train})
...     if i % 100 == 0:
...         print('Iteration %i, training loss: %f' % (i, c))
Iteration 100, training loss: 3984.505859
Iteration 200, training loss: 3465.406494
Iteration 300, training loss: 3258.358398
Iteration 400, training loss: 3147.374023
Iteration 500, training loss: 3080.261475
Iteration 600, training loss: 3037.964111
Iteration 700, training loss: 3010.845947
Iteration 800, training loss: 2993.270752
Iteration 900, training loss: 2981.771240
Iteration 1000, training loss: 2974.175049
Apply the trained model on the testing set:
>>> predictions = sess.run(pred, feed_dict={x: X_test})
>>> print(predictions)
[230.2237 124.89581 170.9626 170.43433 224.11993 153.07018
 105.98048 90.66377 149.22597 191.74197 194.04721 153.0992
 170.85931 104.24113 169.2757 135.45589 260.55713 102.38674
 118.585556 123.41965 219.20732 67.479996 138.3001 122.41016
  57.012245 189.88608 114.48331 131.13383 202.2418 53.08335 ]

Table of Contents for Implementing linear regression

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing linear regression