i
i
i
i
i
i
i
i
8.3. A Brief Look at Some Advanced Ideas in Computer Vision 211
where the matrix M is derived from the translation matrix. Using the camera
calibration matrices P and P
to project the world points X
3
and X
3
onto the
viewing planes, X = PX
3
and X
= P
X
3
, Equation (8.8) becomes
(P
−1
X
)
T
M(P
1
X) = 0,
or
X
T
((P
−1
)
T
MP
1
X) = 0.
And finally, by letting F = (P
−
1)
T
MP
1, the final result is obtained with
an equation that defines the fundamental matrix:
X
T
FX = 0. (8.9)
Equation (8.9) affords a way to calculate the nine coefficients of F using
the same techniques of estimation that were cov ered in S ection 8.3.4. When
written in full for some point x
i
projecting to X
i
and X
i
, Equation (8.9)
resembles some of the results obtained earlier, e.g., the camera matrix P.The
full structure of Equation (8.9) is
X
i
Y
i
W
i
f
00
f
01
f
02
f
10
f
11
f
12
f
20
f
21
f
22
X
i
Y
i
W
i
= 0.
Like the P
3
to P
2
homology H, the fundamental matrix has only eight inde-
pendent coefficients.
There are a large number of other interesting properties that emerge from
a deeper analysis of Equation (8.9), but for the purposes of this introdu ctory
chapter and in the context of VR, we simply refer you again to [3] and [7] for
full details.
8.3.7 Extracting 3D Information from Multiple Views
Section 8.3.6 hinted that by using the concept of epipolar geometry, two pic-
tures taken from two different viewpoints of a 3D world can reveal a lot more
about that world than we would obtain from a single picture. After all, we
have already discussed that for VR, wor k stereopsis is an important concept,
and stereoscopic projection and head-mounted displays are an essential com-
ponent of a high quality VR environment. Nevertheless, one is still entitled
to ask if there are any advantages in having an even greater number of views
i
i
i
i
i
i
i
i
212 8. Computer Vision in VR
of the world. Do three, four or indeed n views justify the undoubted mathe-
matical complexity that will result as one tries to make sense of the data and
analyze it?
For some problems, the answer is of course yes, but the advantage is less
pronounced from our viewpoint, i.e., in VR. We have two eyes, so the step in
going from a single view to two views is simply bringing the computer-based
virtual world closer to our own sense of the real world. To some extent, we
are not particularly interested in the aspect of computer vision that is more
closely allied to the subject of image recognition and artificial intelligence,
where these multi-views do help the stupid computer recognize things better
and reach more human-like conclusions when it looks at something.
For this reason, we are not going to delve into the analysis of three-view
or n-view configurations. It is just as well, because when three views are
considered, matrices (F for example) are no longer sufficient to describe the
geometry, and one has to enter the realm of the tensor and its algebra and
analysis. Whilst tensor theory is not an impenetrable branch of mathematics,
it is somewhat outside the scope of our book on VR. In terms of computer
vision, when thinking about three views, the important tensor is called the
trifocal tensor. One can read about it from those who proposed and developed
it [6, 13].
8.3.8 Extracting 3D Information from Image Sequences
If you think about it for a moment, making a video in which the camera
moves from one spot to another effectively gives us at least two views of a
scene. Of course, we get a lot more than two views, because even if the
journey takes as short a time as one second, we may have 25 to 30 pictures,
one from each frame of the movie. So on the face of it, taking video or
moving footage of a scene should be much more helpful in extracting scene
detail than just a couple of camera snapshots.
Unfortunately, there is one small complication about using a single cam-
era to capture scene detail (and we are not talking about the fact that we dont
move the camera, or that we simply pan it across the scene, which is no bet-
ter). We are talking about the fact that elements in the scene get the chance
to mov e. Moving elements will make the recognition of points much more
difficult. For example, in using the fundamental matrix to calculate range, we
cannot be assured that if we take two pictures of a scene and select a point
of interest in one picture (use F to find its epipolar line in the second), the
image of the point in the second picture will still be lying on the epipolar line.
i
i
i
i
i
i
i
i
8.3. A Brief Look at Some Advanced Ideas in Computer Vision 213
Nevertheless, video or sequences of images are often so much easier to
obtain than two contemporaneously obtained stills that getting information
from them has taken on a discipline of its own, the study of str ucture from
motion (SfM). Details of techniques used to solve the SfM problem lie too
far away from our theme of VR and so we refer you to the overview [9] (also
available online).
One example of SfM which is of interest to us in connection with 3D
structure reco very in a noisy environment is illustrated in Figure 8.23, which
shows how feature tracking and feature matching are used to build up a 3D
model from a sequence of images obtained by moving the viewpoint and/or
objects. The three images in Figure 8.23 were acquired from slightly different
viewpoints of a standard test model as the camera pans around it. Using
a feature-detection algorithm, points of interest are selected in the first and
third images. When equivalent points are matched, their predicted positions
in the second image can be reconstructed. These are shown on the bottom
left. At the bottom right, we see the result of applying the feature detection
directly to the second image. The indicated areas sho w small differences, but
the overall impression is that the interpolation is quite good. Consequently,
SfM offers high potential as a tool for reconstructing virtual environments
from photographs.
Figure 8.23. Three images taken from a video clip are used by an SfM process to
acquire some 3D structure data.
i
i
i
i
i
i
i
i
214 8. Computer Vision in VR
8.4 Software Libraries
for Computer Vision
The algorithms we have described in this chapter involve a significant degree
of numerical complexity when they are implemented in practice. Practical
considerations also affect the format in which images and movies are stored.
The length of a computer program written to do a simple image-processing
task, such as an edge detection, can be increased by an order of magnitude if
the image is stored as a JPEG image as opposed to an uncompressed bitmap.
There are two freely available program function libraries that have proved
popular amongst engineers and computer scientists working in VR, image
processing and computer vision: Microsofts Vision SDK [10] and Intel’s
OpenCV [8].
The Vision SDK r educes much of the drudgery involved in loading im-
ages and per forming data manipulations. It is a low-level library, intended
to pro v ide a strong programming foundation for research and application de-
velopment; it is not a high-level platform for end-users to experiment with
imaging operations. It has a nice interface to Windows, such as shared im-
age memory across processes. It has user-definable pixel types and a device-
independent interface for image acquisition. It can be used to create binaries
which can be transported and run on a machine with any suppor ted digitizer
or camera. It can be easily extended to support new types of digitizers or
cameras. Vision SDK is distributed in source form as nine projects for Visual
C++ that create libraries and DLLs for use in applications, e.g., the VisMa-
trix project provides classes for vectors and matrices. Classes CVisVector4,
CVisTransform4x4, and CVisTransformChain are specialized to work with
three-dimensional vectors and matrices using homogeneous coordinates.
Intel’s OpenCV library provides a cross-platform middle- to high-level
API that consists of more than 300 C language functions. It does not rely on
external libraries, though it can use some when necessary. OpenCV provides a
transparent interface to Intel’s integrated performance primitives (IPP). That
is, it automatically loads IPP libraries optimized for specific processors at run-
time, if they are available.
OpenCV has operating-system–independent functions to load and save
image files in various formats, but it is not optimized for Windows in the way
that Vision SDK is. However, it is richly populated with useful functions to
perform computer vision algorithms:
i
i
i
i
i
i
i
i
8.4. Software Libraries for Computer Vision 215
Image processing, including:
Gradients, edges and corners
Sampling, interpolation and geometrical transforms
Filters and color conversion
Special image transforms
Structural analysis, including:
Contour processing
Computational geometry
Planar subdivisions
Motion analysis and object tracking, including:
Accumulation of background statistics
Motion templates
Object tracking
Optical flow
Estimators
Pattern recognition and object detection
Camera calibration and 3D reconstruction
Epipolar geometry
By using these two powerful libraries, it is possible to create some user-
friendly and sophisticated application programs for fundamental computer
vision development and research. In VR applications, the availability of these
libraries allows us to craft useful code without having to have a deep under-
standing of mathematical numerical m ethods and linear algebra.
One of the most exciting recent applications in VR is augmented reality.
We briefly mentioned this in the context of motion tracking (Section 4.3) and
earlier in this chapter. A key step in the augmentation is being able to place
the virtual object into the real world view with the correct dimension position
and orientation. To do this, the image-processing software must be able to
recognize one or more predefined planar marker(s) and calculate the homog-
raphy between their planes and the camera image plane. The ARToolKit [1]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.174.0