Reconstructing the scene

Finally, we can reconstruct the 3D scene by making use of a process called triangulation. We are able to infer the 3D coordinates of a point because of the way epipolar geometry works. By calculating the essential matrix, we get to know more about the geometry of the visual scene than we might think. Because the two cameras depict the same real-world scene, we know that most of the 3D real-world points will be found in both images. Moreover, we know that the mapping from the 2D image points to the corresponding 3D real-world points, will follow the rules of geometry. If we study a sufficiently large number of image points, we can construct, and solve, a (large) system of linear equations to get the ground truth of the real-world coordinates.

Let's return to the Swiss fountain dataset. If we ask two photographers to take a picture of the fountain from different viewpoints at the same time, it is not hard to realize that the first photographer might show up in the picture of the second photographer, and vice-versa. The point on the image plane where the other photographer is visible is called the epipole or epipolar point. In more technical terms, the epipole is the point on one camera's image plane onto which the center of projection of the other camera projects. It is interesting to note that both the epipoles in their respective image planes, and both the centers of projection, lie on a single 3D line. By looking at the lines between the epipoles and image points, we can limit the number of possible 3D coordinates of the image points. In fact, if the projection point is known, then the epipolar line (which is the line between the image point and the epipole) is known, and in turn the same point projected onto the second image must lie on that particular epipolar line. Confusing? I thought so. Let's just look at these images:

Reconstructing the scene

Each line here is the epipolar line of a particular point in the image. Ideally, all the epipolar lines drawn in the left-hand-side image should intersect at a point, and that point typically lies outside the image. If the calculation is accurate, then that point should coincide with the location of the second camera as seen from the first camera. In other words, the epipolar lines in the left-hand-side image tell us that the camera that took the right-hand-side image is located to our (that is, the first camera's) right-hand side. Analogously, the epipolar lines in the right-hand-side image tell us that the camera that took the image on the left is located to our (that is, the second camera's) left-hand side.

Moreover, for each point observed in one image, the same point must be observed in the other image on a known epipolar line. This is known as epipolar constraint. We can use this fact to show that if two image points correspond to the same 3D point, then the projection lines of those two image points must intersect precisely at the 3D point. This means that the 3D point can be calculated from two image points, which is what we are going to do next.

Luckily, OpenCV again provides a wrapper to solve an extensive set of linear equations. First, we have to convert our list of matching feature points into a NumPy array:

first_inliers = np.array(self.match_inliers1).reshape(-1, 3)[:, :2]
second_inliers = np.array(self.match_inliers2).reshape(-1, 3)[:, :2]

Triangulation is performed next, using the preceding two [R | t] matrices (self.Rt1 for the first camera and self.Rt2 for the second camera):

pts4D = cv2.triangulatePoints(self.Rt1, self.Rt2, first_inliers.T,second_inliers.T).T

This will return the triangulated real-world points using 4D homogeneous coordinates. To convert them to 3D coordinates, we need to divide the (X,Y,Z) coordinates by the fourth coordinate, usually referred to as W:

pts3D = pts4D[:, :3]/np.repeat(pts4D[:, 3], 3).reshape(-1, 3)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.141.27