Chapter 8. Computer Vision in VR (7/8)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.3. A Brief Look at Some Advanced Ideas in Computer Vision 211

where the matrix M is derived from the translation matrix. Using the camera

calibration matrices P and P



to project the world points X

and X



onto the

viewing planes, X = PX

and X



= P



, Equation (8.8) becomes

−1



)

M(P

−1

X) = 0,

T

((P

−1

)

−1

X) = 0.

And ﬁnally, by letting F = (P

−

−

1, the ﬁnal result is obtained with

an equation that deﬁnes the fundamental matrix:

T

FX = 0. (8.9)

Equation (8.9) affords a way to calculate the nine coefﬁcients of F using

the same techniques of estimation that were cov ered in S ection 8.3.4. When

written in full for some point x

projecting to X

and X



, Equation (8.9)

resembles some of the results obtained earlier, e.g., the camera matrix P.The

full structure of Equation (8.9) is







⎡

⎣

⎤

⎦

⎡

⎣

⎤

⎦

= 0.

Like the P

to P

homology H, the fundamental matrix has only eight inde-

pendent coefﬁcients.

There are a large number of other interesting properties that emerge from

a deeper analysis of Equation (8.9), but for the purposes of this introdu ctory

chapter and in the context of VR, we simply refer you again to [3] and [7] for

full details.

8.3.7 Extracting 3D Information from Multiple Views

Section 8.3.6 hinted that by using the concept of epipolar geometry, two pic-

tures taken from two different viewpoints of a 3D world can reveal a lot more

about that world than we would obtain from a single picture. After all, we

have already discussed that for VR, wor k stereopsis is an important concept,

and stereoscopic projection and head-mounted displays are an essential com-

ponent of a high quality VR environment. Nevertheless, one is still entitled

to ask if there are any advantages in having an even greater number of views

212 8. Computer Vision in VR

of the world. Do three, four or indeed n views justify the undoubted mathe-

matical complexity that will result as one tries to make sense of the data and

analyze it?

For some problems, the answer is of course yes, but the advantage is less

pronounced from our viewpoint, i.e., in VR. We have two eyes, so the step in

going from a single view to two views is simply bringing the computer-based

virtual world closer to our own sense of the real world. To some extent, we

are not particularly interested in the aspect of computer vision that is more

closely allied to the subject of image recognition and artiﬁcial intelligence,

where these multi-views do help the stupid computer recognize things better

and reach more human-like conclusions when it looks at something.

For this reason, we are not going to delve into the analysis of three-view

or n-view conﬁgurations. It is just as well, because when three views are

considered, matrices (F for example) are no longer sufﬁcient to describe the

geometry, and one has to enter the realm of the tensor and its algebra and

analysis. Whilst tensor theory is not an impenetrable branch of mathematics,

it is somewhat outside the scope of our book on VR. In terms of computer

vision, when thinking about three views, the important tensor is called the

trifocal tensor. One can read about it from those who proposed and developed

it [6, 13].

8.3.8 Extracting 3D Information from Image Sequences

If you think about it for a moment, making a video in which the camera

moves from one spot to another effectively gives us at least two views of a

scene. Of course, we get a lot more than two views, because even if the

journey takes as short a time as one second, we may have 25 to 30 pictures,

one from each frame of the movie. So on the face of it, taking video or

moving footage of a scene should be much more helpful in extracting scene

detail than just a couple of camera snapshots.

Unfortunately, there is one small complication about using a single cam-

era to capture scene detail (and we are not talking about the fact that we don’t

move the camera, or that we simply pan it across the scene, which is no bet-

ter). We are talking about the fact that elements in the scene get the chance

to mov e. Moving elements will make the recognition of points much more

difﬁcult. For example, in using the fundamental matrix to calculate range, we

cannot be assured that if we take two pictures of a scene and select a point

of interest in one picture (use F to ﬁnd its epipolar line in the second), the

image of the point in the second picture will still be lying on the epipolar line.

8.3. A Brief Look at Some Advanced Ideas in Computer Vision 213

Nevertheless, video or sequences of images are often so much easier to

obtain than two contemporaneously obtained stills that getting information

from them has taken on a discipline of its own, the study of str ucture from

motion (SfM). Details of techniques used to solve the SfM problem lie too

far away from our theme of VR and so we refer you to the overview [9] (also

available online).

One example of SfM which is of interest to us in connection with 3D

structure reco very in a noisy environment is illustrated in Figure 8.23, which

shows how feature tracking and feature matching are used to build up a 3D

model from a sequence of images obtained by moving the viewpoint and/or

objects. The three images in Figure 8.23 were acquired from slightly different

viewpoints of a standard test model as the camera pans around it. Using

a feature-detection algorithm, points of interest are selected in the ﬁrst and

third images. When equivalent points are matched, their predicted positions

in the second image can be reconstructed. These are shown on the bottom

left. At the bottom right, we see the result of applying the feature detection

directly to the second image. The indicated areas sho w small differences, but

the overall impression is that the interpolation is quite good. Consequently,

SfM offers high potential as a tool for reconstructing virtual environments

from photographs.

Figure 8.23. Three images taken from a video clip are used by an SfM process to

acquire some 3D structure data.

214 8. Computer Vision in VR

8.4 Software Libraries

for Computer Vision

The algorithms we have described in this chapter involve a signiﬁcant degree

of numerical complexity when they are implemented in practice. Practical

considerations also affect the format in which images and movies are stored.

The length of a computer program written to do a simple image-processing

task, such as an edge detection, can be increased by an order of magnitude if

the image is stored as a JPEG image as opposed to an uncompressed bitmap.

There are two freely available program function libraries that have proved

popular amongst engineers and computer scientists working in VR, image

processing and computer vision: Microsoft’s Vision SDK [10] and Intel’s

OpenCV [8].

The Vision SDK r educes much of the drudgery involved in loading im-

ages and per forming data manipulations. It is a low-level library, intended

to pro v ide a strong programming foundation for research and application de-

velopment; it is not a high-level platform for end-users to experiment with

imaging operations. It has a nice interface to Windows, such as shared im-

age memory across processes. It has user-deﬁnable pixel types and a device-

independent interface for image acquisition. It can be used to create binaries

which can be transported and run on a machine with any suppor ted digitizer

or camera. It can be easily extended to support new types of digitizers or

cameras. Vision SDK is distributed in source form as nine projects for Visual

C++ that create libraries and DLLs for use in applications, e.g., the VisMa-

trix project provides classes for vectors and matrices. Classes CVisVector4,

CVisTransform4x4, and CVisTransformChain are specialized to work with

three-dimensional vectors and matrices using homogeneous coordinates.

Intel’s OpenCV library provides a cross-platform middle- to high-level

API that consists of more than 300 C language functions. It does not rely on

external libraries, though it can use some when necessary. OpenCV provides a

transparent interface to Intel’s integrated performance primitives (IPP). That

is, it automatically loads IPP libraries optimized for speciﬁc processors at run-

time, if they are available.

OpenCV has operating-system–independent functions to load and save

image ﬁles in various formats, but it is not optimized for Windows in the way

that Vision SDK is. However, it is richly populated with useful functions to

perform computer vision algorithms:

8.4. Software Libraries for Computer Vision 215

• Image processing, including:

– Gradients, edges and corners

– Sampling, interpolation and geometrical transforms

– Filters and color conversion

– Special image transforms

• Structural analysis, including:

– Contour processing

– Computational geometry

– Planar subdivisions

• Motion analysis and object tracking, including:

– Accumulation of background statistics

– Motion templates

– Object tracking

– Optical ﬂow

– Estimators

• Pattern recognition and object detection

• Camera calibration and 3D reconstruction

• Epipolar geometry

By using these two powerful libraries, it is possible to create some user-

friendly and sophisticated application programs for fundamental computer

vision development and research. In VR applications, the availability of these

libraries allows us to craft useful code without having to have a deep under-

standing of mathematical numerical m ethods and linear algebra.

One of the most exciting recent applications in VR is augmented reality.

We brieﬂy mentioned this in the context of motion tracking (Section 4.3) and

earlier in this chapter. A key step in the augmentation is being able to place

the virtual object into the real world view with the correct dimension position

and orientation. To do this, the image-processing software must be able to

recognize one or more predeﬁned planar marker(s) and calculate the homog-

raphy between their planes and the camera image plane. The ARToolKit [1]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8. Computer Vision in VR (7/8)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8. Computer Vision in VR (7/8)