As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Compared with the monocular-based object detection approach, the binocular-based one can exploit much richer cues like the depth information. However, existing binocular-based methods typically require to calculate the explicit disparity maps of scenes as a depth cue, which brings extra computational cost and may cause errors in intermediate depth inference. In order to overcome these shortcomings, we propose an end-to-end neural network for binocular-based object detection. The network has an asymmetric two-stream architecture. One stream takes charge of the depth cue extraction from stereo images, called the Implicit Depth Mining Network (IDMN). The other stream, called the Multi-Modal Detection Network (MMDN), is to exploit the appearance cue from a monocular image and then to fuse the appearance cue and depth cue for object detection. Such a model exploits depth information but does not need to explicitly calculate the disparity map or depth map, so it can work efficiently in practice. Experimental results on indicate that our method achieves a good trade-off between effectiveness and efficiency.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.