Chapter 8. Computer Vision in VR (4/8)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

196 8. Computer Vision in VR

Figure 8.10. Perspective distortion using points in a different plane results in a differ-

ent correction.

of them, say h

= 1. Thus for points (X

, Y

); 0 ≤ i ≤ 3 and their matches

, Y

); 0 ≤ i ≤ 3, we get the linear equations

⎡

⎢

⎣

10 00−X

−Y

000X

1 −Y

−Y

10 00−X

−Y

000X

1 −Y

−Y

10 00−X

−Y

000X

1 −Y

−Y

10 00−X

−Y

000X

1 −Y

−Y

⎤

⎥

⎦

⎡

⎢

⎣

⎤

⎥

⎦

⎡

⎢

⎣

⎤

⎥

⎦

(8.4)

This simple-looking (and ﬁrst-guess) method to determine H, and with

it the ability to correct for perspective distortion, is ﬂawed in practice because

we cannot determine the location of the four points with sufﬁcient accuracy.

The discrete pixels used to record the image and the resulting numerical prob-

lems introduce quantization noise, which confounds the numerical solution

of Equation (8.4). To obtain a robust and accurate solution, it is necessary

to use m any more point correspondences and apply one of the estimation

algorithms outlined in Section 8.3.4.

8.3. A Brief Look at Some Advanced Ideas in Computer Vision 197

8.3.3 Recovering the Camera Matrix

In the previous section, it was shown that it is in theory possible to correct

for the perspective distortion that accrues when taking photographs. In an

analogous way, if we can match several known points in the 3D world scene

with their corresponding image points on the projective plane (see Fig-

ure 8.11) then we should be able to obtain the camera matrix P. Now why

would this be so important? Well, in Sections 8.2.1 and 8.2.2, we saw that we

can estimate the position of a real-world point in the image scene if we know

the camera matrix. If we cannot a ssume that the camera we are using is ideal

(i.e., the pinhole camera model) then it is very difﬁcult to determine whatthe

parameters of the camera matrix are. This enters into the realms of camera

calibration, where we need to develop a mathematical model which allows

us to transform real-world points to image coordinates. This mathematical

model is of course the camera model and is based on three sets of parameters:

1. The extrinsic parameters of the camera which describe the relationship

between the camera frame and the world frame; that is, the translation

and rotation data needed to align both reference frames.

2. The extrinsic parameters of the camera which describe its character-

istics, such as lens focal length, pixel scale factors and location of the

image center.

3. Distortion parameters which describe the geometric nonlinearities of

the camera. As we described in Section 8.3.1, this can be removed

Figure 8.11. Matching image points on the projection plane to points in the 3D world.

198 8. Computer Vision in VR

separately and does not need to be included in the camera matrix. This

is the method we prefer to use.

All these parameters are encompassed in the 3 × 4 matrix we identiﬁed

before. That is

⎡

⎣

⎤

⎦

= P

⎡

⎢

⎣

⎤

⎥

⎦

or in full we can write

⎡

⎣

⎤

⎦

⎡

⎣

⎤

⎦

⎡

⎢

⎣

⎤

⎥

⎦

If we multiply out the terms of this matrix and decide, as is normal, that

W = 1, then we have

X = p

x + p

y + p

z + p

Y = p

x + p

y + p

z + p

1 = p

x + p

y + p

z + p

The third equation in this sequence is interesting because if its value needs to

be equal to 1 then the term p

must be dependent on the values of p

, p

and p

. This means that the complete camera m atrix contains 11 indepen-

dent values.

As we know already, every point correspondence affords two equations

(one for x and one for y). So 5

point correspondences are needed between

the real-world image coordinates and the image coordinates in order to de-

rive the camera matrix rather than derive it directly from camera calibration

methods. If 5

correspondences are used then the solution is exact.

The problem with this method is that if six or more correspondences are

used, we don’t ﬁnd that some correspondences simply drop out of the calcu-

lation because they are linear combinations of others. What we ﬁnd is that

we obtain quite wide variations in the numerical coefﬁcients of P depending

on which 5

arechosenforthecalculation. This is due to pixel quantization

8.3. A Brief Look at Some Advanced Ideas in Computer Vision 199

noise in the image and errors in matching the correspondences. Resolving the difﬁ-

culty is a major problem in computer vision, one that is the subject of current and

ongoing research.

Intuitively, it would seem evident that if the effect of noise and errors are

to be minimized then the more points taken into account, the more accurately

the coefﬁcients of the camera matrix could be determined. This is equally

true for the coefﬁcients of a homography between planes in P

and for the

techniques that use two and three views to recover 3D structure. A number

of very useful algorithms have been developed that can make the best use of

the point correspondences in determining P etc. These are generally referred

to as optimization algorithms, but many experts who specialize in computer

vision prefer the term estimation algorithms.

8.3.4 Estimation

So, the practicalities of determining the parameters of the camera matrix are

confounded by the presence of noise and error. This also holds true for per-

spective correction and all the other procedures that are used in computer

vision. It is a major topic in itself to devise, characterize and study algorithms

that work robustly. By robustly, we mean algorithms that are not too sensi-

tive to changes in noise patterns or the number of points used and can make

an intelligent guess at what points are obviously in gross error and should be

ignored (outliers).

We will only consider here the briefest of outlines of a couple of algo-

rithms that might be employed in these circumstances. The direct linear

transform (DLT) [7, pp. 88–93] algorithm is the simplest; it makes no at-

tempt to account for outliers. The random sample consensus (RANSAC) [5]

and the least median squares (LMS) algorithms use statistical and iterative

procedures to identify and then ignore outliers.

Since all estimation algorithms are optimization algorithms, they attempt

to minimize a cost function. These tend to be geometric, statistical or al-

gebraic. To illustrate the process, we consider a simple form of the DLT

algorithm to solve the problem of removing perspective distortion. (Using

four corresponding points (in image and projection), we obtained eight linear

inhomogeneous equations (see Equation (8.4)), from which then the coefﬁ-

cients of H can be estimated.)

If we can identify more than four matching point correspondences, the

question is then how to use them. This is where the DLT algorithm comes

in. To see how to apply it, consider the form of the matrix in Equation (8.4)

200 8. Computer Vision in VR

for a single point:



1000− x



−y



000x

1 −y



−y





⎡

⎢

⎣

⎤

⎥

⎦







. (8.5)

It should be fairly evident that for every corresponding pair of points,

there will be an additional two linear equations, and for n corresponding

points, the matrix will be of size 2n × 8. Since this is an over-determined

system, one cannot simply proceed to solve for the h

, using Gauss elimina-

tion, for example. It also turns out that using the inhomogeneous form in

Equation (8.5) to compute the coefﬁcients of H in



= Hx

can lead to unstable results and a better approach is to retain the homogeneous

form (which has nine coefﬁcients). Remember, however, that the coefﬁcients

are only determined up to a scale factor. Thus for a single point correspon-

dence between (x, y, w) and (x



) (where the w coordinate is retained), we

have





00 0−x



−y



−x



000w



−y



−y



−y





⎡

⎢

⎣

⎤

⎥

⎦

= 0.

(8.6)

For n points in this homogeneous form, there is now a 2n × 9matrix,

but solutions of the form Ah = 0 are less subject to instabilities than the

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8. Computer Vision in VR (4/8)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8. Computer Vision in VR (4/8)