9. Entered KITTI's Stereo2012 benchmark with a method name of GLDS 96th place
KITTI's true parallax image format is different
Source codes of the Method GLDS (Gaze_Line-Depth model Stereo) are submit_kitti.cpp and graph_cut.cu .The makefile for GLDS is makefile .You should type "make do7" .KITTI's true parallax image is different from TSUKUBA image in the left image pixel scale factor 256 The parallax of was specified with 16 bits. Since I thought that parallax was defined in the right image, I was messed up. The TSUKUBA image and the KITTI image were the same as the reference image on the left.
- quote training/colored_0/000000_10.png again as L000000_10.png
- quote training/colored_1/000000_10.png again as R000000_10.png
- new quote training/disp_occ/000000_10.png as O000000_10.png
- new quote training/disp_noc/000000_10.png as N000000_10.png
, and compile and run. As a result of this check, it was detected that there were 962 pixels with parallax defined even though the parallax could not be defined. The red part in the figure below is a pixel that cannot be defined for parallax. If you look closely at the left and right images, you can see that the red part is not reflected in the right image. As you can see from the makefile, the start of the check program is
. Is the corrected parallax image.
./true disparity_to_correct result_disparity image_in_which_contradiction_is_red
Left parallax image generation with Gaze_line-Depth model
depth_kitti.cpp Is a program to generate a left parallax image by stereo vision processing by gaze_line-depth model. Corresponds to depth_stereo.cpp in article 3, but the parallax is added to the left image instead of the right image, the scale factor is fixed at 256, and the pixels are expressed in 16 bits instead of 8 bits. The depth image used in the next 3d_kitti.cpp is also generated at the same time as the left parallax image. graph_cut.cu If you also download
, left parallax image dis_pen_24_inh_4095.png and depth image dep_pen_24_inh_4095.png will be generated. graph_cut.cu is the same as Article 3. As you can see from the makefile, the start of the left parallax image generation program is
. Is the generated left parallax image. Unlike article 7, parallax is defined in the left image. The scale factor is 256, but it is dark because the pixel is 16 bits. This side seems to be the specification of png image.
./depth right_image left_image max_disparity min_disparity penalty inhibit
3D display program
3d_kitti.cpp Is a 3D display program using depth images and left parallax images. Corresponds to 3d_stereo.cpp in article 3, but does not perform stereo vision processing by itself. In article 3, both depth_stereo.cpp and 3d_stereo.cpp included gaze_line-depth model stereo vision processing. Since KITTI's true left parallax image contains parallax information that could not be defined, I wondered how correct the parallax value itself was and prepared this program for visual confirmation.
Will start the program. The "m key" displays the left image, and the ". Key" displays the right image. Unlike "3d_stereo.cpp", the ", key" displays the left and right images superimposed. In "e key", a cursor display has been added, and the attention line of sight (not gaze_line) of 3d_stereo.cpp is now linked to the cursor. The cursor is on each of the left and right images. The left image cursor can be moved with the "y key", "u key", "i key", "o key", and the right image cursor is "7 key", "8 key", "9 key", "0" You can move with the key. The cursor movement in the vertical direction is linked to the left and right, and the left and right cannot be set to different positions. Visually align the left and right cursors with specific feature points. Use the ", key" to overlay the left and right images. If you change the depth (parallax) with the "s key" and "t key" and the left and right cursors overlap, you can see the depth (parallax) of the feature point. Depth (parallax) displayed when "s key" or "t key" is pressed becomes the parallax of the feature point. On the other hand, when the "y key" or "o key" is pressed, the parallax given by the left parallax image is displayed, and when the "7 key" or "0 key" is pressed, the depth given by the depth image is displayed. By comparing these values, you can see the correctness of each information. "< key" is used for 3D display. Corresponds to the ", key" of 3d_stereo.cpp, but does not perform stereo vision processing, and obtains depth (parallax) information from the depth image or left parallax image. The "w key" switches whether the depth (parallax) information of this 3D display is obtained from the depth image (dis_mode = 0) or the left parallax image (dis_mode = 1). "? Key" will display all key commands. There is also a "6 key" that outputs video. As you can see from the makefile, the startup of the 3D display program is
. Using this program, we confirmed that KITTI's true left parallax image was correct with some feature points, but found no problem.
./3d right_image left_image max_disparity min_disparity depth_image left_disparity
Program to compare parallax
cmp_kitti.cpp is the version of compare_disparity.cpp in article 4 in KITTI format.
Instruct to compile and run with. Now dis_pen_24_inh_4095.png is compared with N000000_10_true.png. The result was average = 1.19797px. In addition to compare_disparity.cpp, the result display also shows the difference in disparity per pixel, that is, the average disparity difference. As you can see from the makefile, the launch of the parallax comparison program is
. The difference image is an image such as , in which pixels are displayed in red when the parallax is greater than the true value and green when it is smaller.
./cmp true_left_disparity left_disparity difference_image
Program to check the range of parallax
chk_kitti.cpp Is a program to check the range of 194 true left parallax provided by KITTI's stereo2012 benchmark training.
Then, the maximum parallax and the minimum parallax of 194 true left parallax images that exist in / home / oguri / KITTI / data_2012 / training / disp_occ and / home / oguri / KITTI / data_2012 / training / disp_noc are checked. An image with the maximum parallax and the minimum parallax can be seen. As a result, the maximum disparity of disp_occ was 231.613, the maximum disparity of disp_noc was 227.988, and the minimum disparity was 4.10547. The image with the largest parallax was like passing through the cage. It was also found that disp_occ contains 80% images up to 136.395 and disp_noc contains 80% images up to 125.434.
Confirmation of evaluation method
You can see how it is evaluated by reading evaluate_stereo.cpp included in the development kit that you can download from the KITTI stereo2012 page. It seems that the parallax is evaluated by the average value of the difference of parallax and the percentage of pixels with 3 or more different pixels. Not only 3 pixels but also 1 pixel or more, 2 pixels or more, 4 pixels or more, 5 pixels or more ratio and effective pixel ratio are also output. Parameters that can be changed in depth_kitti.cpp are the parallax range, penalty value, and inhibit value. The inhibit value can be anything as long as it is sufficiently large, so we set it to 4095. In 1023 used in the TSUKUBA image, the result of red_true_kitti.cpp would be red (with contradiction), so it was almost quadrupled. If gaze_line-depth model is used, inconsistency should not be detected in true_kitti.cpp originally, so it means that the branch that has been inhibited has been cut. On the other hand, since the penalty value and the parallax range can only be matched with each other, we looked for places where the result would be better with some values. I didn't check the whole range, but the parallax was 4 to 120 pixels. The penalty value is 24. With this parameter, the average parallax difference in the training data was 2.909685 px, and the percentage of pixels with different parallaxes by 3 pixels or more was 17.1%.
Data submission and result 96th
submit_kitti.cpp Is a 121-line stereo vision processing program that creates submission data. The stereo vision processing part is the same as depth_kitti.cpp.
Will compile and run. Use graph_cut.cu for graph cut. The test data is 195 pairs of stereo images. I used a color image instead of grayscale. The total processing time was 85 minutes, and each pair was about 26 seconds. Parameters are inhibit value = 4095, penalty value = 24, parallax range = 4px--120px. The result was 96th as of March 24, 2018. The method name is GLDS (Gaze_Line-Depth model Stereo). The percentage of pixels with parallax differences of 3 pixels or more is 17.22%. The average value of the parallax difference was 2.8px. In the gaze_line-depth model stereo, the resolution of parallax is 2px, so the average value of the parallax difference is 2.8px, which is satisfactory. Currently, I am submitting the graph cut, but I think that if I add post-processing, I can raise the ranking considerably. However, given that the human eye does not have that much precision, it doesn't seem to make much sense to lower the average disparity difference.