i
i
i
i
i
i
i
i
212 8. Computer Vision in VR
of the world. Do three, four or indeed n views justify the undoubted mathe-
matical complexity that will result as one tries to make sense of the data and
analyze it?
For some problems, the answer is of course yes, but the advantage is less
pronounced from our viewpoint, i.e., in VR. We have two eyes, so the step in
going from a single view to two views is simply bringing the computer-based
virtual world closer to our own sense of the real world. To some extent, we
are not particularly interested in the aspect of computer vision that is more
closely allied to the subject of image recognition and artificial intelligence,
where these multi-views do help the stupid computer recognize things better
and reach more human-like conclusions when it looks at something.
For this reason, we are not going to delve into the analysis of three-view
or n-view configurations. It is just as well, because when three views are
considered, matrices (F for example) are no longer sufficient to describe the
geometry, and one has to enter the realm of the tensor and its algebra and
analysis. Whilst tensor theory is not an impenetrable branch of mathematics,
it is somewhat outside the scope of our book on VR. In terms of computer
vision, when thinking about three views, the important tensor is called the
trifocal tensor. One can read about it from those who proposed and developed
it [6, 13].
8.3.8 Extracting 3D Information from Image Sequences
If you think about it for a moment, making a video in which the camera
moves from one spot to another effectively gives us at least two views of a
scene. Of course, we get a lot more than two views, because even if the
journey takes as short a time as one second, we may have 25 to 30 pictures,
one from each frame of the movie. So on the face of it, taking video or
moving footage of a scene should be much more helpful in extracting scene
detail than just a couple of camera snapshots.
Unfortunately, there is one small complication about using a single cam-
era to capture scene detail (and we are not talking about the fact that we don’t
move the camera, or that we simply pan it across the scene, which is no bet-
ter). We are talking about the fact that elements in the scene get the chance
to mov e. Moving elements will make the recognition of points much more
difficult. For example, in using the fundamental matrix to calculate range, we
cannot be assured that if we take two pictures of a scene and select a point
of interest in one picture (use F to find its epipolar line in the second), the
image of the point in the second picture will still be lying on the epipolar line.