Linear regression in OpenCV

Before trying out linear regression on a real-life dataset, let's understand how we can use the cv2.fitLine function to fit a line to a 2D or 3D point set:

  1. Let's start by generating some points. We will generate them by adding noise to the points lying on the line :
In [1]: import cv2
... import numpy as np
... import matplotlib.pyplot as plt
... from sklearn import linear_model
... from sklearn.model_selection import train_test_split
... plt.style.use('ggplot')
... %matplotlib inline
In [2]: x = np.linspace(0,10,100)
... y_hat = x*5+5
... np.random.seed(42)
... y = x*5 + 20*(np.random.rand(x.size) - 0.5)+5
  1. We can also visualize these points using the following code:
In [3]: plt.figure(figsize=(10, 6))
... plt.plot(x, y_hat, linewidth=4)
... plt.plot(x,y,'x')
... plt.xlabel('x')
... plt.ylabel('y')

This gives us the following diagram, where the red line is the true function:

  1. Next, we will split the points into training and testing sets. Here, we will split the data into a 70:30 ratio, meaning, 70% of the points will be used for training and 30% for testing:
In [4]: x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=42)
  1. Now, let's use cv2.fitLine to fit a line to this 2D point set. This function takes in the following arguments:
    • points: This is the set of points to which a line has to be fit.
    • distType: This is the distance used by the M-estimator.
    • param: This is the numerical parameter (C), which is used in some types of distances. We will keep it at 0 so that an optimal value can be chosen.
    • reps: This is the accuracy of the distance between the origin and the line. 0.01 is a good default value for reps.
    • aeps: This is the accuracy of the angle. 0.01 is a good default value for aeps.
For more information, have a look at the documentation.
  1. Let's see what kinds of result we get using different distance type options:
In [5]: distTypeOptions = [cv2.DIST_L2,
... cv2.DIST_L1,
... cv2.DIST_L12,
... cv2.DIST_FAIR,
... cv2.DIST_WELSCH,
... cv2.DIST_HUBER]

In [6]: distTypeLabels = ['DIST_L2',
... 'DIST_L1',
... 'DIST_L12',
... 'DIST_FAIR',
... 'DIST_WELSCH',
... 'DIST_HUBER']

In [7]: colors = ['g','c','m','y','k','b']
In [8]: points = np.array([(xi,yi) for xi,yi in zip(x_train,y_train)])
  1. We will also use scikit-learn's LinearRegression to fit the training points and then use the predict function to predict the y-values for them:
In [9]: linreg = linear_model.LinearRegression()
In [10]: linreg.fit(x_train.reshape(-1,1),y_train.reshape(-1,1))
Out[10]:LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,normalize=False)
In [11]: y_sklearn = linreg.predict(x.reshape(-1,1))
In [12]: y_sklearn = list(y_sklearn.reshape(1,-1)[0])
  1. We use reshape(-1,1) and reshape(1,-1) to convert the NumPy arrays into a column vector and then back into a row vector:
In [13]: plt.figure(figsize=(10, 6))
... plt.plot(x, y_hat,linewidth=2,label='Ideal')
... plt.plot(x,y,'x',label='Data')

... for i in range(len(colors)):
... distType = distTypeOptions[i]
... distTypeLabel = distTypeLabels[i]
... c = colors[i]

... [vxl, vyl, xl, yl] = cv2.fitLine(np.array(points, dtype=np.int32), distType, 0, 0.01, 0.01)
... y_cv = [vyl[0]/vxl[0] * (xi - xl[0]) + yl[0] for xi in x]
... plt.plot(x,y_cv,c=c,linewidth=2,label=distTypeLabel)

... plt.plot(x,list(y_sklearn),c='0.5',
linewidth=2,label='Scikit-Learn API')
... plt.xlabel('x')
... plt.ylabel('y')
... plt.legend(loc='upper left')

The only purpose of this preceding (and lengthy) code was to create a plot that could be used to compare the results obtained using different distance measures.

Let's have a look at the plot:

As we can clearly see, scikit-learn's LinearRegression model performs much better than OpenCV's fitLine function. Now, let's use scikit-learn's API to predict Boston housing prices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.186