Planning the app

The final app will extract and visualize structure from motion on a pair of images. We will assume that these two images have been taken with the same camera, whose internal camera parameters we know. If these parameters are not known, they need to be estimated first in a camera calibration process.

The final app will then consist of the following modules and scripts:

chapter4.main: This is the main function routine for starting the application.
scene3D.SceneReconstruction3D: This is a class that contains a range of functionalities for calculating and visualizing structure from motion. It includes the following public methods:
- __init__: This constructor will accept the intrinsic camera matrix and the distortion coefficients.
- load_image_pair: This is a method used to load two images that have been taken with the camera described earlier from the file.
- plot_optic_flow: This is a method used to visualize the optic flow between the two image frames.
- draw_epipolar_lines: This method is used to draw the epipolar lines of the two images.
- plot_rectified_images: This method is used to plot a rectified version of the two images.
- plot_point_cloud: This is a method used to visualize the recovered real-world coordinates of the scene as a 3D point cloud. In order to arrive at a 3D point cloud, we will need to exploit epipolar geometry. However, epipolar geometry assumes the pinhole camera model, which no real camera follows.

The complete procedure of the app involves the following steps:

Camera calibration: We will use a chessboard pattern to extract the intrinsic camera matrix as well as the distortion coefficients, which are important for performing the scene reconstruction.
Feature matching: We will match points in two 2D images of the same visual scene, either via SIFT or via optic flow, as seen in the following screenshot:

Image rectification: By estimating the camera motion from a pair of images, we will extract the essential matrix and rectify the images, as shown in the following screenshot:

Triangulation: We will reconstruct the 3D real-world coordinates of the image points by making use of constraints from epipolar geometry.
3D point cloud visualization: Finally, we will visualize the recovered 3D structure of the scene using scatterplots in Matplotlib, which is most compelling when studied using the Pan axes button from pyplot. This button lets you rotate and scale the point cloud in all three dimensions. In the following screenshot, the color corresponds to the depth of a point in the scene:

First, we need to rectify our images to make them look as if they have come from a pinhole camera. For that, we need to estimate the parameters of the camera, which leads us to the field of camera calibration.

Table of Contents for Planning the app

Create new playlist

Sign In

Sign Up

Table of Contents for
Planning the app