2.3. Camera Calibration with 3D Objects

The traditional way to calibrate a camera is to use a 3D reference object such as those shown in Figure 2.3. In Figure 2.3a, the calibration apparatus used at INRIA [8] is shown, which consists of two orthogonal planes; a checker pattern is printed on each. A 3D coordinate system is attached to this apparatus, and the coordinates of the checker corners are known very accurately in this coordinate system. A similar calibration apparatus is a cube with a checker pattern painted in each face, so three faces are generally visible to the camera. Figure 2.3b illustrates the device used in Tsai's technique [33], which uses only one plane with checker pattern, but the plane needs to be displaced at least once with known motion. This is equivalent to knowing the 3D coordinates of the checker corners.

Figure 2.3. 3D apparatus for calibrating cameras.


A popular technique in this category consists of four steps [8]:

1.
Detect the corners of the checker pattern in each image;

2.
Estimate the camera projection matrix P using linear least squares;

3.
Recover intrinsic and extrinsic parameters A, R, and t from P;

4.
Refine A, R, and t through a nonlinear optimization.

It is also possible to first refine P through a nonlinear optimization, and then determine A, R, and t from the refined P.

It is worth noting that using corners is not the only possibility. We can avoid corner detection by working directly in the image. In [25], calibration is realized by maximizing the gradients around a set of control points that define the calibration object. Figure 2.4 illustrates the control points used in that work.

Figure 2.4. Control points used in a gradient-based calibration technique.


2.3.1. Feature extraction

If we use a generic corner detector, such as Harris corner detector, to detect the corners in the checker pattern image, the result is usually not good because the detector corners have poor accuracy (about one pixel). A better solution is to leverage the known pattern structure by first estimating a line for each side of the square and then computing the corners by intersecting the fitted lines. There are two common techniques to estimate the lines. The first is to first detect edges, and then fit a line to the edges on each side of the square. The second technique is to directly fit a line to each side of a square in the image such that the gradient on the line is maximized. One possibility is to represent the line by an elongated Gaussian and estimate the parameters of the elongated Gaussian by maximizing the total gradient covered by the Gaussian. We should note that if the lens distortion is not severe, a better solution is to fit just one single line to all the collinear sides. This allows a much more accurate estimation of the position of the checker corners.

2.3.2. Linear estimation of the camera projection matrix

Once we extract the corner points in the image, we can easily establish their correspondences with the points in the 3D space because of knowledge of the patterns. Based on the projection equation (2.1), we can now estimate the camera parameters. However, the problem is quite nonlinear if we try to estimate directly A, R, and t. If, on the other hand, we estimate the camera projection matrix P, a linear solution is possible.

Given each 2D-3D correspondence mi = (ui, vi) ↔ Mi = (Xi, Yi, Zi), we can write down two equations based on (2.1):


where p = [p11, p12, · · ·, p34]T and 0 = [0, 0]T.

For n point matches, we can stack all equations together:


Matrix G is a 2n × 12 matrix. The projection matrix can now be solved by


The solution is the eigenvector of GTG associated with the smallest eigenvalue.

In the above, in order to avoid the trivial solution p = 0 and considering that p is defined up to a scale factor, we have set ||p|| = 1. Other normalizations are possible. In [1], p34 = 1, which, however, introduce a singularity when the correct value of p34 is close to zero. In [10], the constraint was used, which is singularity free.

Anyway, the above linear technique minimizes an algebraic distance and yields a biased estimation when data are noisy. We present an unbiased solution later.

2.3.3. Recover intrinsic and extrinsic parameters from P

Once the camera projection matrix P is known, we can uniquely recover the intrinsic and extrinsic parameters of the camera. Let us denote the first 3 × 3 submatrix of P by B and the last column of P by b, that is, P ≡ [B b]. Since P = A[R t], we have

Equation 2.5


Equation 2.6


From 2.5), we have


Because P is defined up to a scale factor, the last element of K = BBT is usually not equal to 1, so we have to normalize it such that K33(the last element) = 1. After that, we immediately obtain

Equation 2.7


Equation 2.8


Equation 2.9


Equation 2.10


Equation 2.11


The solution is unambiguous because α > 0 and β > 0.

Once the intrinsic parameters, or equivalently matrix A, are known, the extrinsic parameters can be determined from 2.5) and (2.6) as

Equation 2.12


Equation 2.13


2.3.4. Refine calibration parameters through a nonlinear optimization

The above solution is obtained through minimizing an algebraic distance, which is not physically meaningful. We can refine it through maximum likelihood inference.

We are given n 2D-3D correspondences mi = (ui, vi) ↔ Mi = (Xi, Yi, Zi). Assume that the image points are corrupted by independent and identically distributed noise. The maximum likelihood estimate can be obtained by minimizing the distances between the image points and their predicted positions:

Equation 2.14


where φ(P, Mi) is the projection of Mi onto the image according to (2.1).

This is a nonlinear minimization problem, which can be solved with the Levenberg-Marquardt algorithm, as implemented in Minpack [23]. It requires an initial guess of P, which can be obtained using the linear technique described earlier. Note that since P is defined up to a scale factor, we can set the element having the largest initial value as 1 during the minimization.

Alternatively, instead of estimating P as in (2.14), we can directly estimate the intrinsic and extrinsic parameters, A, R, and t, using the same criterion. The rotation matrix can be parameterized with three variables such as Euler angles or scaled rotation vector.

2.3.5. Lens distortion

Up to this point, we use the pinhole model to describe a camera. It says that the point in 3D space, its corresponding point in image, and the camera's optical center are collinear. This linear projective equation is sometimes not sufficient, especially for low-end cameras (such as webcams) or wide-angle cameras; lens distortion has to be considered.

According to [33], there are four steps in camera projection, including lens distortion:

Step 1.
Rigid transformation from world coordinate system (Xw, Yw, Zw) to camera one (X, Y, Z):


Step 2.
Perspective projection from 3D camera coordinates (X, Y, Z) to ideal image coordinates (x, y) under pinhole camera model:


where f is the effective focal length.

Step 3.
Lens distortion[2]:

[2] The lens distortion described here is different from Tsai's treatment. Here, we go from ideal to real image coordinates, similar to [36].


where are the distorted or true image coordinates, and (δx, δy) are distortions applied to (x, y).

Step 4.
Affine transformation from real image coordinates to frame buffer (pixel) image coordinates :


where (u0 v0) are coordinates of the principal point, and dx and dy are distances between adjacent pixels in the horizontal and vertical directions respectively.

There are two types of distortions:

Radial distortion: It is symmetric; ideal image points are distorted along radial directions from the distortion center. This is caused by imperfect lens shape.

Decentering distortion: This is usually caused by improper lens assembly; ideal image points are distorted in both radial and tangential directions.

Refer to [29, 3, 6, 37] for more details.

The distortion can be expressed as power series in radial distance :


where k1, k2, are coefficients of radial distortion and p1, p2, are coefficients of decentering distortion.

Based on the reports in the literature [3, 33, 36], it is likely that the distortion function is totally dominated by the radial components and especially dominated by the first term. It has also been found that any more elaborated modeling not only would not help (negligible when compared with sensor quantization), but also would cause numerical instability [33, 36].

Denote the ideal pixel image coordinates by u = x/dx and v = y/dy. By combining Step 3 and Step 4 and using only the first two radial distortion terms, we obtain the following relationship between and (u, v):

Equation 2.15


Equation 2.16


Following the same reasoning as in (2.14), camera calibration including lens distortion can be performed by minimizing the distances between the image points and their predicted positions:

Equation 2.17


where (A, R, t, k1, k2, Mi) is the projection of Mi onto the image according to (2.1), followed by distortion according to (2.15) and (2.16).

2.3.6. An example

Figure 2.5 displays an image of a 3D reference object taken by a camera to be calibrated at INRIA. Each square has four corners, and there are in total 128 points used for calibration.

Figure 2.5. Camera calibration with a 3D apparatus.


Without considering lens distortion, the estimated camera projection matrix is


From P, we can calculate the intrinsic parameters: α = 1380.12, β = 2032.57, γ ≈ 0, u0 = 246.52, and v0 = 243.68. So, the angle between the two image axes is 90°, and the aspect ratio of the pixels is α/β = 0.679. For the extrinsic parameters, the translation vector t = [–211.28, –106.06, 1583.75]T (in mm); that is, the calibration object is about 1.5m away from the camera; the rotation axis is [-0.08573, –0.99438, 0.0621]T (i.e., almost vertical), and the rotation angle is 47.7°.

Other notable works in this category include [27, 38, 36, 18].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.109.21