Camera calibration

So far, we have worked with whatever image came straight out of our webcam, without questioning the way in which it was taken. However, every camera lens has unique parameters, such as focal length, principal point, and lens distortion. What happens behind the covers when a camera takes a picture, is that; light falls through a lens, followed by an aperture, before falling on the surface of a light sensor. This process can be approximated with the pinhole camera model. The process of estimating the parameters of a real-world lens such that it would fit the pinhole camera model is called camera calibration (or camera resectioning, and it should not be confused with photometric camera calibration).

The pinhole camera model

The pinhole camera model is a simplification of a real camera in which there is no lens and the camera aperture is approximated by a single point (the pinhole). When viewing a real-world 3D scene (such as a tree), light rays pass through the point-sized aperture and fall on a 2D image plane inside the camera, as seen in the following diagram:

The pinhole camera model

In this model, a 3D point with coordinates (X,Y,Z) is mapped to a 2D point with coordinates (x,y) that lies on the image plane. Note that this leads to the tree appearing upside down on the image plane. The line that is perpendicular to the image plane, and passes through the pinhole is called the principal ray, and its length is called the focal length. The focal length is a part of the internal camera parameters, as it may vary depending on the camera being used.

Hartley and Zisserman found a mathematical formula to describe how a 2D point with coordinates (x,y) can be inferred from a 3D point with coordinates (X,Y,Z) and the camera's intrinsic parameters, as follows:

The pinhole camera model

For now, let's focus on the 3 x 3 matrix in the preceding formula, which is the intrinsic camera matrix—a matrix that compactly describes all internal camera parameters. The matrix comprises focal lengths (fx and fy) and optical centers (cx and cy) expressed in pixel coordinates. As mentioned earlier, the focal length is the distance between the pinhole and the image plane. A true pinhole camera has only one focal length, in which case fx = fy = f. However, in reality, these two values might differ, maybe due to flaws in the digital camera sensor. The point at which the principal ray intersects the image plane is called the principal point, and its relative position on the image plane is captured by the optical center (or principal point offset).

In addition, a camera might be subject to radial or tangential distortion, leading to a fish-eye effect. This is because of hardware imperfections and lens misalignments. These distortions can be described with a list of the distortion coefficients. Sometimes, radial distortions are actually a desired artistic effect. At other times, they need to be corrected.

Note

For more information on the pinhole camera model, there are many good tutorials out there on the Web, such as http://ksimek.github.io/2013/08/13/intrinsic.

Because these parameters are specific to the camera hardware (hence the name intrinsic), we need to calculate them only once in the lifetime of a camera. This is called camera calibration.

Estimating the intrinsic camera parameters

In OpenCV, camera calibration is fairly straightforward. The official documentation provides a good overview of the topic and some sample C++ scripts at http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html.

For educational purposes, we will develop our own calibration script in Python. We will need to present a special pattern image, with a known geometry (chessboard plate or black circles on a white background), to the camera we wish to calibrate. Because we know the geometry of the pattern image, we can use feature detection to study the properties of the internal camera matrix. For example, if the camera suffers from undesired radial distortion, the different corners of the chessboard pattern will appear distorted in the image and not lie on a rectangular grid. By taking about 10 to 20 snapshots of the chessboard pattern from different points of view, we can collect enough information to correctly infer the camera matrix and the distortion coefficients.

For this, we will use the calibrate.py script. Analogous to previous chapters, we will use a simple layout (CameraCalibration) based on BaseLayout that embeds a webcam video stream. The main function of the script will generate the GUI and execute the main loop of the app:

import cv2
import numpy as np
import wx

from gui import BaseLayout


    def main():
        capture = cv2.VideoCapture(0)
        if not(capture.isOpened()):
            capture.open()

        capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 640)
        capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 480)

        # start graphical user interface
        app = wx.App()
        layout = CameraCalibration(None, -1, 'Camera Calibration', capture)
        layout.Show(True)
        app.MainLoop()

Note

If you are using OpenCV 3, the constants that you are looking for might be called cv3.CAP_PROP_FRAME_WIDTH and cv3.CAP_PROP_FRAME_HEIGHT.

The camera calibration GUI

The GUI is a customized version of the generic BaseLayout:

class CameraCalibration(BaseLayout):

The layout consists of only the current camera frame and a single button below it. This button allows us to start the calibration process:

def _create_custom_layout(self):
    """Creates a horizontal layout with a single button"""
    pnl = wx.Panel(self, -1)
    self.button_calibrate = wx.Button(pnl,label='Calibrate Camera')
    self.Bind(wx.EVT_BUTTON, self._on_button_calibrate)
    hbox = wx.BoxSizer(wx.HORIZONTAL)
    hbox.Add(self.button_calibrate)
    pnl.SetSizer(hbox)

For these changes to take effect, pnl needs to be added to the list of existing panels:

self.panels_vertical.Add(pnl, flag=wx.EXPAND | wx.BOTTOM | wx.TOP, border=1)

The rest of the visualization pipeline is handled by the BaseLayout class. We only need to make sure that we provide the _init_custom_layout and _process_frame methods.

Initializing the algorithm

In order to perform the calibration process, we need to do some bookkeeping. For now, let's focus on a single 10 x 7 chessboard. The algorithm will detect all the 9 x 6 inner corners of the chessboard (referred to as object points) and store the detected image points of these corners in a list. So, let's first initialize the chessboard size to the number of inner corners:

def _init_custom_layout(self):
    """Initializes camera calibration"""
    # setting chessboard size
    self.chessboard_size = (9, 6)

Next, we need to enumerate all the object points and assign them object point coordinates so that the first point has coordinates (0,0), the second one (top row) has coordinates (1,0), and the last one has coordinates (8,5):

# prepare object points
self.objp = np.zeros((np.prod(self.chessboard_size), 3), dtype=np.float32)
self.objp[:, :2] = np.mgrid[0:self.chessboard_size[0],0:self.chessboard_size[1]].T.reshape(-1, 2)

We also need to keep track of whether we are currently recording the object and image points or not. We will initiate this process once the user clicks on the self.button_calibrate button. After that, the algorithm will try to detect a chessboard in all subsequent frames until a number of self.record_min_num_frames chessboards have been detected:

    # prepare recording
    self.recording = False
    self.record_min_num_frames = 20
    self._reset_recording()

Whenever the self.button_calibrate button is clicked on, we reset all the bookkeeping variables, disable the button, and start recording:

def _on_button_calibrate(self, event):
    self.button_calibrate.Disable()
    self.recording = True
    self._reset_recording()

Resetting the bookkeeping variables involves clearing the lists of recorded object and image points (self.obj_points and self.img_points) as well as resetting the number of detected chessboards (self.recordCnt) to 0:

def _reset_recording(self):
    self.record_cnt = 0
    self.obj_points = []
    self.img_points = []

Collecting image and object points

The _process_frame method is responsible for doing the hard work of the calibration technique. After the self.button_calibrate button has been clicked on, this method starts collecting data until a total of self.record_min_num_frames chessboards are detected:

def _process_frame(self, frame):
    """Processes each frame"""

    # if we are not recording, just display the frame
    if not self.recording:
        return frame

    # else we're recording
    img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY).astype(np.uint8)

    if self.record_cnt < self.record_min_num_frames:
        ret, corners = cv2.findChessboardCorners(img_gray, self.chessboard_size, None)

The cv2.findChessboardCorners function will parse a grayscale image (img_gray) to find a chessboard of size self.chessboard_size. If the image indeed contains a chessboard, the function will return true (ret) as well as a list of chessboard corners (corners).

Then, drawing the chessboard is straightforward:

if ret:
    cv2.drawChessboardCorners(frame, self.chessboard_size, corners, ret)

The result looks like this (drawing the chessboard corners in color for the effect):

Collecting image and object points

We could now simply store the list of detected corners and move on to the next frame. However, in order to make the calibration as accurate as possible, OpenCV provides a function to refine the corner point measurement:

criteria = (cv2.TERM_CRITERIA_EPS + 
        cv2.TERM_CRITERIA_MAX_ITER, 30, 0.01)
cv2.cornerSubPix(img_gray, corners, (9, 9), (-1, -1),
        criteria)

This will refine the coordinates of the detected corners to subpixel precision. Now we are ready to append the object and image points to the list and advance the frame counter:

self.obj_points.append(self.objp)
self.img_points.append(corners)
self.record_cnt += 1

Finding the camera matrix

Once we have collected enough data (that is, once self.record_cnt reaches the value of self.record_min_num_frames), the algorithm is ready to perform the calibration. This process can be performed with a single call to cv2.calibrateCamera:

else:
    print "Calibrating..."
    ret, K, dist, rvecs, tvecs = cv2.calibrateCamera(self.obj_points, self.img_points, (self.imgHeight, self.imgWidth), None, None)

The function returns true on success (ret), the intrinsic camera matrix (K), the distortion coefficients (dist), as well as two rotation and translation matrices (rvecs and tvecs). For now, we are mainly interested in the camera matrix and the distortion coefficients, because these will allow us to compensate for any imperfections of the internal camera hardware. We will simply print them on the console for easy inspection:

print "K=", K
print "dist=", dist

For example, the calibration of my laptop's webcam recovered the following values:

K= [[ 3.36696445e+03 0.00000000e+00 2.99109943e+02]
    [ 0.00000000e+00 3.29683922e+03 2.69436829e+02]
    [ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist= [[ 9.87991355e-01 -3.18446968e+02 9.56790602e-02 -3.42530800e-02 4.87489304e+03]]

This tells us that the focal lengths of my webcam are fx=3366.9644 pixels and fy=3296.8392 pixels, with the optical center at cx=299.1099 pixels and cy=269.4368 pixels.

A good idea might be to double-check the accuracy of the calibration process. This can be done by projecting the object points onto the image using the recovered camera parameters so that we can compare them with the list of image points we collected with the cv2.findChessboardCorners function. If the two points are roughly the same, we know that the calibration was successful. Even better, we can calculate the mean error of the reconstruction by projecting every object point in the list:

mean_error = 0
for i in xrange(len(self.obj_points)):
    img_points2, _ = cv2.projectPoints(self.obj_points[i], rvecs[i], tvecs[i], K, dist)
    error = cv2.norm(self.img_points[i], img_points2, cv2.NORM_L2)/len(img_points2)
    mean_error += error

print "mean error=",  {} pixels".format(mean_error)

Performing this check on my laptop's webcam resulted in a mean error of 0.95 pixels, which is fairly close to zero.

With the internal camera parameters recovered, we can now set out to take beautiful, undistorted pictures of the world, possibly from different viewpoints so that we can extract some structure from motion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.32.67