i
i
i
i
i
i
i
i
196 8. Computer Vision in VR
Figure 8.10. Perspective distortion using points in a different plane results in a differ-
ent correction.
of them, say h
22
= 1. Thus for points (X
i
, Y
i
); 0 i 3 and their matches
(X
mi
, Y
mi
); 0 i 3, we get the linear equations
X
0
Y
0
10 00X
0
X
m0
Y
0
X
m0
000X
0
Y
0
1 Y
m0
X
0
Y
m0
Y
0
X
1
Y
1
10 00X
1
X
m1
Y
1
X
m
000X
1
Y
1
1 Y
m1
X
1
Y
m1
Y
1
X
2
Y
2
10 00X
2
X
m2
Y
2
X
m2
000X
2
Y
2
1 Y
m2
x
2
Y
m2
Y
2
X
3
Y
3
10 00X
3
X
m3
Y
3
X
m3
000X
3
Y
3
1 Y
m3
X
3
Y
m3
Y
3
h
00
h
01
h
02
h
10
h
11
h
12
h
20
h
21
=
X
m0
Y
m0
X
m1
Y
m1
X
m2
Y
m2
X
m3
Y
m3
.
(8.4)
This simple-looking (and first-guess) method to determine H, and with
it the ability to correct for perspective distortion, is flawed in practice because
we cannot determine the location of the four points with sufficient accuracy.
The discrete pixels used to record the image and the resulting numerical prob-
lems introduce quantization noise, which confounds the numerical solution
of Equation (8.4). To obtain a robust and accurate solution, it is necessary
to use m any more point correspondences and apply one of the estimation
algorithms outlined in Section 8.3.4.
i
i
i
i
i
i
i
i
8.3. A Brief Look at Some Advanced Ideas in Computer Vision 197
8.3.3 Recovering the Camera Matrix
In the previous section, it was shown that it is in theory possible to correct
for the perspective distortion that accrues when taking photographs. In an
analogous way, if we can match several known points in the 3D world scene
with their corresponding image points on the projective plane (see Fig-
ure 8.11) then we should be able to obtain the camera matrix P. Now why
would this be so important? Well, in Sections 8.2.1 and 8.2.2, we saw that we
can estimate the position of a real-world point in the image scene if we know
the camera matrix. If we cannot a ssume that the camera we are using is ideal
(i.e., the pinhole camera model) then it is very difficult to determine whatthe
parameters of the camera matrix are. This enters into the realms of camera
calibration, where we need to develop a mathematical model which allows
us to transform real-world points to image coordinates. This mathematical
model is of course the camera model and is based on three sets of parameters:
1. The extrinsic parameters of the camera which describe the relationship
between the camera frame and the world frame; that is, the translation
and rotation data needed to align both reference frames.
2. The extrinsic parameters of the camera which describe its character-
istics, such as lens focal length, pixel scale factors and location of the
image center.
3. Distortion parameters which describe the geometric nonlinearities of
the camera. As we described in Section 8.3.1, this can be removed
Figure 8.11. Matching image points on the projection plane to points in the 3D world.
i
i
i
i
i
i
i
i
198 8. Computer Vision in VR
separately and does not need to be included in the camera matrix. This
is the method we prefer to use.
All these parameters are encompassed in the 3 × 4 matrix we identified
before. That is
X
Y
W
= P
x
y
z
w
,
or in full we can write
X
Y
W
=
p
00
p
01
p
02
p
03
p
10
p
11
p
12
p
13
p
20
p
21
p
22
p
23
x
y
z
w
.
If we multiply out the terms of this matrix and decide, as is normal, that
W = 1, then we have
X = p
00
x + p
01
y + p
02
z + p
03
w,
Y = p
10
x + p
11
y + p
12
z + p
13
w,
1 = p
20
x + p
21
y + p
22
z + p
23
w.
The third equation in this sequence is interesting because if its value needs to
be equal to 1 then the term p
23
must be dependent on the values of p
20
, p
21
and p
22
. This means that the complete camera m atrix contains 11 indepen-
dent values.
As we know already, every point correspondence affords two equations
(one for x and one for y). So 5
1
2
point correspondences are needed between
the real-world image coordinates and the image coordinates in order to de-
rive the camera matrix rather than derive it directly from camera calibration
methods. If 5
1
2
correspondences are used then the solution is exact.
The problem with this method is that if six or more correspondences are
used, we dont find that some correspondences simply drop out of the calcu-
lation because they are linear combinations of others. What we nd is that
we obtain quite wide variations in the numerical coefficients of P depending
on which 5
1
2
arechosenforthecalculation. This is due to pixel quantization
i
i
i
i
i
i
i
i
8.3. A Brief Look at Some Advanced Ideas in Computer Vision 199
noise in the image and errors in matching the correspondences. Resolving the diffi-
culty is a major problem in computer vision, one that is the subject of current and
ongoing research.
Intuitively, it would seem evident that if the effect of noise and errors are
to be minimized then the more points taken into account, the more accurately
the coefficients of the camera matrix could be determined. This is equally
true for the coefficients of a homography between planes in P
2
and for the
techniques that use two and three views to recover 3D structure. A number
of very useful algorithms have been developed that can make the best use of
the point correspondences in determining P etc. These are generally referred
to as optimization algorithms, but many experts who specialize in computer
vision prefer the term estimation algorithms.
8.3.4 Estimation
So, the practicalities of determining the parameters of the camera matrix are
confounded by the presence of noise and error. This also holds true for per-
spective correction and all the other procedures that are used in computer
vision. It is a major topic in itself to devise, characterize and study algorithms
that work robustly. By robustly, we mean algorithms that are not too sensi-
tive to changes in noise patterns or the number of points used and can make
an intelligent guess at what points are obviously in gross error and should be
ignored (outliers).
We will only consider here the briefest of outlines of a couple of algo-
rithms that might be employed in these circumstances. The direct linear
transform (DLT) [7, pp. 88–93] algorithm is the simplest; it makes no at-
tempt to account for outliers. The random sample consensus (RANSAC) [5]
and the least median squares (LMS) algorithms use statistical and iterative
procedures to identify and then ignore outliers.
Since all estimation algorithms are optimization algorithms, they attempt
to minimize a cost function. These tend to be geometric, statistical or al-
gebraic. To illustrate the process, we consider a simple form of the DLT
algorithm to solve the problem of removing perspective distortion. (Using
four corresponding points (in image and projection), we obtained eight linear
inhomogeneous equations (see Equation (8.4)), from which then the coeffi-
cients of H can be estimated.)
If we can identify more than four matching point correspondences, the
question is then how to use them. This is where the DLT algorithm comes
in. To see how to apply it, consider the form of the matrix in Equation (8.4)
i
i
i
i
i
i
i
i
200 8. Computer Vision in VR
for a single point:
x
i
y
i
1000x
i
x
i
y
i
x
i
000x
i
y
i
1 y
i
x
i
y
i
y
i
h
00
h
01
h
02
h
10
h
11
h
12
h
20
h
21
=
x
i
y
i
. (8.5)
It should be fairly evident that for every corresponding pair of points,
there will be an additional two linear equations, and for n corresponding
points, the matrix will be of size 2n × 8. Since this is an over-determined
system, one cannot simply proceed to solve for the h
ij
, using Gauss elimina-
tion, for example. It also turns out that using the inhomogeneous form in
Equation (8.5) to compute the coefficients of H in
x
= Hx
can lead to unstable results and a better approach is to retain the homogeneous
form (which has nine coefficients). Remember, however, that the coefficients
are only determined up to a scale factor. Thus for a single point correspon-
dence between (x, y, w) and (x
y
w
) (where the w coordinate is retained), we
have
w
i
x
i
w
i
y
i
w
i
w
i
00 0x
i
x
i
y
i
x
i
x
i
w
i
000w
i
x
i
w
i
y
i
w
i
w
i
y
i
x
i
y
i
y
i
y
i
w
i
h
00
h
01
h
02
h
10
h
11
h
12
h
20
h
21
h
22
= 0.
(8.6)
For n points in this homogeneous form, there is now a 2n × 9matrix,
but solutions of the form Ah = 0 are less subject to instabilities than the
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.173.131