Chapter 4. 3D Scene Reconstruction Using Structure from Motion

The goal of this chapter is to study how to reconstruct a scene in 3D by inferring the geometrical features of the scene from camera motion. This technique is sometimes referred to as structure from motion. By looking at the same scene from different angles, we will be able to infer the real-world 3D coordinates of different features in the scene. This process is known as triangulation, which allows us to reconstruct the scene as a 3D point cloud.

In the previous chapter, you learned how to detect and track an object of interest in the video stream of a webcam, even if the object is viewed from different angles or distances, or under partial occlusion. Here, we will take the tracking of interesting features a step further and consider what we can learn about the entire visual scene by studying similarities between image frames. If we take two pictures of the same scene from different angles, we can use feature matching or optic flow to estimate any translational and rotational movement that the camera underwent between taking the two pictures. However, in order for this to work, we will first have to calibrate our camera.

The complete procedure involves the following steps:

  1. Camera calibration: We will use a chessboard pattern to extract the intrinsic camera matrix as well as the distortion coefficients, which are important for performing the scene reconstruction.
  2. Feature matching: We will match points in two 2D images of the same visual scene, either via Speeded-Up Robust Features (SURF) or via optic flow, as seen in the following image:
    3D Scene Reconstruction Using Structure from Motion
  3. Image rectification: By estimating the camera motion from a pair of images, we will extract the essential matrix and rectify the images.
    3D Scene Reconstruction Using Structure from Motion
  4. Triangulation: We will reconstruct the 3D real-world coordinates of the image points by making use of constraints from epipolar geometry.
  5. 3D point cloud visualization: Finally, we will visualize the recovered 3D structure of the scene using scatterplots in matplotlib, which is most compelling when studied using pyplot's Pan axes button. This button lets you rotate and scale the point cloud in all three dimensions. It is a little harder to visualize in still frames, as can be seen in the following figure (left panel: standing slightly in front to the left side of the fountain, center panel: looking down on the fountain, right panel: standing slightly in front to the right of the fountain):
    3D Scene Reconstruction Using Structure from Motion

Note

This chapter has been tested with OpenCV 2.4.9 and wxPython 2.8 (http://www.wxpython.org/download.php). It also requires NumPy (http://www.numpy.org) and matplotlib (http://www.matplotlib.org/downloads.html). Note that if you are using OpenCV3, you may have to obtain the so-called extra modules from https://github.com/Itseez/opencv_contrib and install OpenCV3 with the OPENCV_EXTRA_MODULES_PATH variable set in order to get SURF installed. Also note that you may have to obtain a license to use SURF in commercial applications.

Planning the app

The final app will extract and visualize structure from motion on a pair of images. We will assume that these two images have been taken with the same camera, whose internal camera parameters we know. If these parameters are not known, they need to be estimated first in a camera calibration process.

The final app will then consist of the following modules and scripts:

  • chapter4.main: This is the main function routine for starting the application.
  • scene3D.SceneReconstruction3D: This is a class that contains a range of functionalities for calculating and visualizing structure from motion. It includes the following public methods:
    • __init__: This constructor will accept the intrinsic camera matrix and the distortion coefficients
    • load_image_pair: A method used to load from the file, two images that have been taken with the camera described earlier
    • plot_optic_flow: A method used to visualize the optic flow between the two image frames
    • draw_epipolar_lines: A method used to draw the epipolar lines of the two images
    • plot_rectified_images: A method used to plot a rectified version of the two images
  • plot_point_cloud: This is a method used to visualize the recovered real-world coordinates of the scene as a 3D point cloud. In order to arrive at a 3D point cloud, we will need to exploit epipolar geometry. However, epipolar geometry assumes the pinhole camera model, which no real camera follows. We need to rectify our images to make them look as if they have come from a pinhole camera. For that, we need to estimate the parameters of the camera, which leads us to the field of camera calibration.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.125