Abstract
Visual saliency is a very important feature for object detection in a complex scene. However, image-based saliency is influenced by clutter background and similar objects in indoor scenes, and pixel-based saliency cannot provide consistent saliency to a whole object. Therefore, in this paper, we propose a novel method that computes visual saliency maps from multimodal data obtained from indoor scenes, whilst keeping region consistency. Multimodal data from a scene are first obtained by an RGB+D camera. This scene is then segmented into over-segments by a self-adapting approach to combine its colour image and depth map. Based on these over-segments, we develop two cues as domain knowledge to improve the final saliency map, including focus regions obtained from colour images, and planar background structures obtained from point cloud data. Thus, our saliency map is generated by compounding the information of the colour data, the depth data and the point cloud data in a scene. In the experiments, we extensively compare the proposed method with state-of-the-art methods, and we also apply the proposed method to a real robot system to detect objects of interest. The experimental results show that the proposed method outperforms other methods in terms of precisions and recall rates.
Notes
No potential conflict of interest was reported by the authors.