14. Multi-Frame Stereo
Assuming KITTI's Raw Data City
So far we have explained a new concept of stereo vision called gaze_line-depth model that obtains 3D information from left and right images of stereo camera.
Here, we would like to consider the information obtained from the left and right images that are continuous in time from a stereo video camera.
However, I would like to focus only on images that can be seen from a car that slowly travels in the city, not an arbitrary moving image.
I will use 2011_09_26_drive_0091 from City of KITTI's Raw Data as an image from a car that runs slowly in the city.
KITTI's license is "The Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License".
Here, 340 color images were cited from the vast amount of data.
Please follow this license for the reuse of this image.
If these 340 color stereo images are converted into a movie, it will look like
Detection of rotational movement information between frames
Since it is an image from a car that travels slowly in the city, it can be considered that most of the change in the image is caused by the movement of the car. Based on this premise, we will consider rotating the 3D shape obtained from a stereo image at a certain time and superimposing it on the 3D shape obtained from the stereo image at the next time. This can also be used to evaluate the performance of stereo vision processing because it cannot be determined if the 3D information obtained by stereo vision processing is not correct to some extent. For correct 3D information, the focal length in pixels and the distance between cameras must be correct. If this is not correct, it will not be possible to overlap by rotational movement. Since the focal length and the distance between the cameras can be obtained by calibration, it also evaluates the performance of the calibration process.
Let's use an example to explain what it is.
The following movie displays frame 82 and frame 83 alternately.
The following movie is a three-dimensional display of frames 0 to 30 as seen from the car.
Detection method of rotational movement information
A technique for detecting the camera arrangement and the shape of a stationary object from multiple still images from the same camera is called SLAM (Simultaneous Localization and Mapping), and various methods have been studied. Many methods extract feature points that are robust and easy to match by image processing, and restore three-dimensional information from the image positions of the corresponding feature points. Here, the three-dimensional shape itself is rotated without performing any such image processing. The RRR is the rotation matrix and TTT is the movement vector, and the optimal value of this rotation matrix and movement vector is searched. The rotation movement of the three-dimensional shape is performed as in
void trans(int w, int h, int d, int &rw, int &rh, int &rd) {
double x = -focal*half_base/d;
double y = w*half_base/d;
double z = -h*half_base/d;
rot_and_mv(RRR, TTT, x, y, z);
double D = -focal*half_base/x;
double W = y*D/half_base;
double H = -z*D/half_base;
rw = (int)(W+0.5);
rh = (int)(H+0.5);
rd = (int)(D+0.5);
}
.
Here, rot_and_mv (RRR, TTT, x, y, z) is a function that overwrites (x, y, z) with the result of rotating the vector (x, y, z).
(w, h) represents the position of gaze_line, and d is the depth of that position.
focal is the focal length in pixels, and half_base is half the distance between cameras.
The last three lines are rounded off.
As you can see from this code, the representation of 3D space is also a gaze_line-depth model.
At first, I thought about Point Cloud, but it stopped because it was a heavy load to judge whether the points were close.
The evaluation of whether or not they are overlapped uses the degree of coincidence of the RGB values representing the brightness and color where the three-dimensional position matches in the gaze_line-depth space.
In addition, we search for the optimum value in units of rotation angle 0.001 degrees and travel 1mm.
The search is performed by the method of searching for the minimum value while changing the parameters in order, which was also used in the parallel processing of the camera.
Correction using 3D information of previous frame
There are two problems with the stereo vision of the gaze_line-depth model.
One is that the result is strange at the front discontinuity, and the other is that the 3D information of the back is given priority when the distance between the cameras is large.
First look at