Stereo Vision Laboratory

15. Object recognition

Object detection

The second movie explains the second problem of stereo vision of the gaze_line-depth model.

The thing in the back takes precedence

If you look closely at this video, the head of the stake on the right front of the road has been cut, and the shin of the male in blue clothes has been restored. The man in the blue clothes was cut off from his head and the wall behind it was restored. In addition, the sign bars on the left and right sides of the road are transparent in the middle, and the ones in the back are beautifully restored as solids. Among the three-dimensional structures with wrinkles, the gaze_line-depth model has a problem that the deeper one has priority. This is not a mistake as a 3D reconstruction, but I want you to give priority to the front as a human sense. The pixel-disparity model seems to have a more serious problem, but it has not been confirmed. To solve this problem, it is necessary to determine the position and shape of the object more accurately. The position and shape of the object can be detected by preparing a rectangular parallelepiped area and changing the size and position of the rectangular parallelepiped area so that the graph cut value of the rectangular parallelepiped area is as small as possible when it is moved back and forth. You The following video shows how the detection is progressing.

Object detection

The green frame represents the rectangular parallelepiped area that is being detected. When the detection is finished, it proceeds to the next position. The first is the stake in front of the road. The top appears to be too high, but there is a pile head in that position. Next is a bicycle boy. Next is a girl on a bicycle. Next is a road sign stick. Next is a man in blue clothes. Next is a man in green clothes. Next is a road sign. The last is the left road sign stick. If the object can be detected, the likelihood of the surface existence hypothesis point is changed so that the object in front is prioritized, and the whole graph is cut to obtain an easy-to-understand 3D recognition result like

3D recognition results giving priority to the front