ABSTRACT
To solve the problems of the complex multi-scale searching and the low correlation between classification confidence and location accuracy in the visual tracking process, we propose a novel visual tracking framework named ISiamCRAN. ISiamCRAN uses modified ResNet-50 as the backbone network to extract depth features. Then, these extracted features are fed into an improved classification-regression adaptive head for depth feature cross-correlation operation. Different from existing trackers, in order to remove low-quality prediction bounding boxes, we integrate a quality assessment branch to the classification-regression adaptive head. Moreover, we use an elliptical sample label assignment strategy to replace traditional strategy, by this way our tracker can more accurately distinguish the foreground and background. Finally, for these feature response maps obtained by depth feature cross-correlation operation, the position and scale of the target are predicted directly in a unified fully convolutional network by an anchor-free manner. Extensive experiments on OTB100 and VOT2018 benchmarks indicate that our ISiamCRAN outperforms other traditional trackers, and is robust to motion blur and scale variation. Our tracker runs at approximatively 35 fps on GPU.
Acknowledgements
This work is supported in part by the National Natural Science Foundation China (61601174), in part by the Postdoctoral Research Foundation of Heilongjiang Province (LBHQ17150), in part by the Science and Technology Innovative Research Team in Higher Educational Institutions of Heilongjiang Province (No. 2012TD007), in part by the Fundamental Research Funds for the Heilongjiang Provincial Universities (KJCXZD201703) and in part by the Science Foundation of Heilongjiang Province of China (F2018026).
Disclosure statement
No potential conflict of interest was reported by the author(s).