5.9 Free-View Point Video

A free viewpoint 3D video system allows end users to arbitrarily change the viewing position and angle to enrich viewers' immersive experiences. The key bridging technology between content generation, transmission, and display stages of a 3D video system is the 3D scene representation. As discussed in Chapter 2, there are two fundamental approaches to defining 3D scenes: image-based and geometry-based representation [59].

Image-based representation uses a set of camera views arranged in a 1D/2D array to represent the 3D world, which can be represented as a set of sparse sampling of the plenoptic function [60]. There have been several similar requirements and operations in video processing applications, such as spatial upsampling or frame rate upconversion, where new pixels are generated by utilizing the information from the existing pixels. For the free viewpoint 3D application, a new view can be synthesized from this sparse set of samples of the plenoptic function. The main advantage is the high quality of virtual view synthesis and an avoidance of 3D scene reconstruction. The computation complexity via image-based representation is proportional to the number of pixels in the reference and output images, but in general not to the geometric complexity such as triangle counts. However, the synthesis ability provided by image-based representation is limited by the viewing angle covered by all the existing views. Besides, the quality of the synthesized views depends on the scene depth variation, the resolution of each view, and the number of views.

The geometry-based representation uses a set of trianglar meshes with a texture to represent the 3D world. The main advantage is that, once geometry information is available, the 3D scene can be rendered from any viewpoint and view direction without any limitation, which meets the requirement for a free viewpoint 3D video system. The main disadvantage is in the computational cost of rendering and storage, which depends on the scene complexity, that is the total number of triangles used to describe the 3D world. In addition, geometry-based representation is generally an approximation to the 3D world. Although there are offline photorealistic rendering algorithms to generate views matching our perception of the real world, the existing algorithms using a graphics pipeline still cannot produce realistic views on the fly.

The recent development of 3D scene representation attempts to bring these two different technologies together to make the technology spectrum broader. By adding geometric information into image-based representation, the disocclusion and resolution problem can be relieved. Similarly, adding image information captured from the real world into geometry-based representation can reduce the rendering cost and storage. Thus, the combination of geometry-based and image-based representation is proposed to render the 3D scene for free viewpoint TV [61].

Depending on how to distribute the computation load in the server side and the client side, and the bandwidth budget, there are three major architectures to enable an end-to-end free viewpoint video system [62]. The first approach is a light client side solution where the encoder performs most of the computation resources in the server side, including 3D scene representation, depth estimation, new view synthesis, and encoding the bit stream for two views intended for each eye via the aforementioned two-view stereo video coding method or MVC. The client side will simply decode the streams and display the two views. Note that the required bandwidth for this method is also smallest as it also contains two-view content. A real-time feedback to track the required viewing angle for the end users is needed so that the server side can synthesize and encode the desired views. The second approach allocates most of the computation load in the client side. The server side encodes via the MVC codec and transmits all captured views through the channel to the receiver. The receiver is responsible for decoding the compressed multi-view streams and estimating the depth from the decompressed images, and synthesizing the required views. The required bandwidth is much higher than the first approach. The third method is to move the depth estimation component to the server side to reduce a certain level of computation load in the client side. However, the estimated depth information needs to be compressed and transmitted through the channel, too. The required codec to deliver both texture and depth image will be the MVD codec.

A standardization activity for free viewpoint TV in MPEG, denoted as MPEG-FTV, was started since April 2007. The effort was based on the aforementioned third architecture, that is MVD codec, to decouple the capture and display component in the whole ecosystem. A Call for Proposals was issued by MPEG for 3D video coding technology (3DVC) March 2011 [63] and tried to provide a solution for efficient compression and high quality view reconstruction of an arbitrary number of dense views.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.241.116