4.1 3D Scene Modeling and Creation

In general, 3D scene modeling and representation approaches can be classified into three categories: geometry-based modeling, image-based modeling, and hybrid modeling which combines the geometry-based and image-based modeling in some way to represent a 3D scene at better storage cost or with better precision. Geometry-based modeling represents the scene using pure 3D computer graphics components, for example, 3D meshes or other means to represent 3D surfaces, texture maps, object material properties such as opacity, reflectance, and so on, and environment properties, such as lighting models, with the purpose of enhancing the realism of the models. Image-based modeling goes to the other extreme, using no 3D geometry, but rather using a set of images captured by a number of cameras with predesigned positions and settings. This approach tends to generate high quality virtual view synthesis without the effort of 3D scene reconstruction, and typically the synthesis quality increases with the availability of more views. The challenge for this approach is that a tremendous amount of image data needs to be stored, transferred, and processed in order to achieve a good quality synthesized view, otherwise interpolation and occlusion artifacts will appear in the synthesized image due to lack of source data. The hybrid approach can leverage these two representation methods to find a compromise between the two extremes according to given constraints; for example, using multiple images and corresponding depth maps to represent 3D scene is a popular method in which the depth maps are the geometric modeling component, but this hybrid representation can reduce the storage and processing of many extra images to achieve the same high-quality synthesized view as the image-based approach.

In industry, many such 3D modeling tools have been adopted for 3D content creation. For example, NewSight provides 3D content creation tools to do 3D modeling, animation, and rendering, and the results are displayed at its autostereoscopic 3D displays. InTru3D is a brand from Intel that allows content created through the Intel's animation technology to be viewed in stereoscopic 3D. Along with InTru3D and ColorCode 3-D, the DreamWorks and PepsiCo's SoBe Lifewater have made Super Bowl XLIII a 3D event. ColorCode 3-D is a 3D stereo system with a novel left/right view encoding algorithm so that the composed image appears essentially as an ordinary color image with slightly increased contrast and with distant or sharp-edge objects surrounded by faint holes of golden and bluish tints, and the ColorCodeViewer has amber and blue filter with complex spectral curves to separate the left/right views at the end. The ColorCode 3-D image works on all display types ranging from computers and TV displays and digital projects to analogue films.

4.1.1 Geometry-Based Modeling

Geometry-based modeling automatically generates 3D scenes by computer; although the current graphics card with modern GPUs are very powerful in rendering and computation, the efforts of 3D scene reconstruction are still complex and time consuming. Typically the camera geometry estimation, depth map generation and 3D shape generation are fundamental components of scene reconstruction efforts. The introduced errors during the estimation process may cause noticeable visual artifacts; sometimes user interactions and assistance are used in order to generate high-quality production of geometry models.

The Shape-from-Silhouette approach is one of the prominent methods in geometry reconstruction. A silhouette of an object representing the contour separates the object from the background. With multiple view images of the 3D scene, the retrieved silhouettes from these views are back projected to a common 3D space with projection centers equal to the camera locations, and a cone-like volume is generated for each projection. By intersection all the cones, a visual hull of the target 3D object is generated. After that, a texturing process is applied to assign colors to the voxels on the surfaces of the visual hull and the realistic rendering process makes the reconstructed 3D object in good visual quality.

4.1.2 Image-Based Modeling

As mentioned earlier, the advantage of image-based modeling is to avoid the complicated 3D scene reconstruction process, as a potential high quality virtual view can be synthesized from the other views obtained during the capturing process by cameras in different locations and at different angles. However, the benefit is paid for by dense sampling of the real world with sufficient number of cameras in various positions, which will produce large volumes of data for further processing.

The plenoptic function [1] plays a very important role in image-based modeling, as the modeling process is indeed a sampling process for capturing the complete flow of light in a region of the environment, thus the well-known 7D data, which includes viewing position (3D), viewing direction (2D), time and wavelength, are popular for describing the visual information. In order to reduce the data size while keeping the rendering quality, researchers have tried different approaches. For example, [2] introduces a 5D function that ignores the wavelength and time dimensions; and light fields [3] and Lumigraph [4] reduced the data to 4D by ignoring the wavelength and time dimensions and assuming that radiance does not change along a line in free space, thus light rays are recorded by their intersections with two planes with two sets of 2D coordinates. Furthermore, [5] reduces the dimensions of the plenoptic function to 3D by restricting both the cameras and the viewers to the same plane.

4.1.3 Hybrid Approaches

It is important to introduce the concept of disparity here. Disparity is typically interpreted as the inverse distances to the observed objects. In order to obtain disparity between two views, a stereo correspondence (or stereo matching) algorithm is called, which compares the correlation of local windows or features and matches these features one by one. Clearly some constraints such as occlusion can make the matching process very challenging. It is worth mentioning that Scharstein and Szeliski [6] created a website, Middlebury stereo vision page, to investigate the performance of around 40 stereo matching algorithms running on a pair of rectified images.

Depth or disparity maps representing the depth information for each sample of an image are often used along with the 2D image to form a 2.5D representation of the 3D scene. Layered depth images are a natural extension where the ordered depth layers are used to store pixel intensities. The current popular multi-view video plus depth representation for 3D scenes is another extension that uses multiple depth maps according to the multiple images to get a more precise scene reconstruction.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.151