Linear regression in OpenCV

Before trying out linear regression on a real-life dataset, let's understand how we can use the cv2.fitLine function to fit a line to a 2D or 3D point set:

Let's start by generating some points. We will generate them by adding noise to the points lying on the line :

In [1]: import cv2
...     import numpy as np
...     import matplotlib.pyplot as plt
...     from sklearn import linear_model
...     from sklearn.model_selection import train_test_split
...     plt.style.use('ggplot')
...     %matplotlib inline
In [2]: x = np.linspace(0,10,100)
...     y_hat = x*5+5
...     np.random.seed(42)
...     y = x*5 + 20*(np.random.rand(x.size) - 0.5)+5

We can also visualize these points using the following code:

In [3]: plt.figure(figsize=(10, 6))
...     plt.plot(x, y_hat, linewidth=4)
...     plt.plot(x,y,'x')
...     plt.xlabel('x')
...     plt.ylabel('y')

This gives us the following diagram, where the red line is the true function:

Next, we will split the points into training and testing sets. Here, we will split the data into a 70:30 ratio, meaning, 70% of the points will be used for training and 30% for testing:

In [4]: x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=42)

Now, let's use cv2.fitLine to fit a line to this 2D point set. This function takes in the following arguments:
- points: This is the set of points to which a line has to be fit.
- distType: This is the distance used by the M-estimator.
- param: This is the numerical parameter (C), which is used in some types of distances. We will keep it at 0 so that an optimal value can be chosen.
- reps: This is the accuracy of the distance between the origin and the line. 0.01 is a good default value for reps.
- aeps: This is the accuracy of the angle. 0.01 is a good default value for aeps.

For more information, have a look at the documentation.

Let's see what kinds of result we get using different distance type options:

In [5]: distTypeOptions = [cv2.DIST_L2,
...                 cv2.DIST_L1,
...                 cv2.DIST_L12,
...                 cv2.DIST_FAIR,
...                 cv2.DIST_WELSCH,
...                 cv2.DIST_HUBER]

In [6]: distTypeLabels = ['DIST_L2',
...                 'DIST_L1',
...                 'DIST_L12',
...                 'DIST_FAIR',
...                 'DIST_WELSCH',
...                 'DIST_HUBER']

In [7]: colors = ['g','c','m','y','k','b']
In [8]: points = np.array([(xi,yi) for xi,yi in zip(x_train,y_train)])

We will also use scikit-learn's LinearRegression to fit the training points and then use the predict function to predict the y-values for them:

In [9]: linreg = linear_model.LinearRegression()
In [10]: linreg.fit(x_train.reshape(-1,1),y_train.reshape(-1,1))
Out[10]:LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,normalize=False)
In [11]: y_sklearn = linreg.predict(x.reshape(-1,1))
In [12]: y_sklearn = list(y_sklearn.reshape(1,-1)[0])

We use reshape(-1,1) and reshape(1,-1) to convert the NumPy arrays into a column vector and then back into a row vector:

In [13]: plt.figure(figsize=(10, 6))
...      plt.plot(x, y_hat,linewidth=2,label='Ideal')
...      plt.plot(x,y,'x',label='Data')

...      for i in range(len(colors)):
...          distType = distTypeOptions[i]
...          distTypeLabel = distTypeLabels[i]
...          c = colors[i]
    
...          [vxl, vyl, xl, yl] = cv2.fitLine(np.array(points, dtype=np.int32), distType, 0, 0.01, 0.01)
...          y_cv = [vyl[0]/vxl[0] * (xi - xl[0]) + yl[0] for xi in x]
...          plt.plot(x,y_cv,c=c,linewidth=2,label=distTypeLabel)

...      plt.plot(x,list(y_sklearn),c='0.5',
linewidth=2,label='Scikit-Learn API')
...      plt.xlabel('x')
...      plt.ylabel('y')
...      plt.legend(loc='upper left')

The only purpose of this preceding (and lengthy) code was to create a plot that could be used to compare the results obtained using different distance measures.

Let's have a look at the plot:

As we can clearly see, scikit-learn's LinearRegression model performs much better than OpenCV's fitLine function. Now, let's use scikit-learn's API to predict Boston housing prices.

Table of Contents for Linear regression in OpenCV

Create new playlist

Sign In

Sign Up

Table of Contents for
Linear regression in OpenCV