223
Views
0
CrossRef citations to date
0
Altmetric
Research Article

C3TB-YOLOv5: integrated YOLOv5 with transformer for object detection in high-resolution remote sensing images

, , , &
Pages 2622-2650 | Received 25 Nov 2023, Accepted 29 Feb 2024, Published online: 03 Apr 2024
 

ABSTRACT

In the realm of object detection from high-resolution remote sensing images (HRRSIs), the existing YOLOv5 methods encounter several challenges, including dense object arrangements, small object sizes, and complex backgrounds. To tackle these challenges, we propose a novel approach called C3TB-YOLOv5, which combines traditional YOLOv5 with the Transformer model to detect objects in HRRSIs. Unlike conventional YOLOv5 methods that primarily focus on capturing local information from remote sensing scenes, our C3TB-YOLOv5 method incorporates global information through the introduction of a new C3TB module. This module, based on the Transformer multi-head attention mechanism (AM), consists of two branches that extract local and global information from feature maps. By integrating these branches and establishing long-range relationships, our method successfully detects densely arranged small objects in HRRSIs. Furthermore, to improve the accuracy of tiny object detection, a novel detection head has been developed to effectively utilize the unused C3 module, thereby preventing the loss of fine-grained textures and positional features. In addition, we integrate an enhanced SimAM, namely Sim-GMP, into the model to adjust the focus across varying regions, effectively distinguishing the features of interested objects from complex backgrounds. Finally, to address the problem of sample imbalance in remote sensing object detection, the most recent Wise-IoU v3 loss function is employed to improve the accuracy of anchor box predictions for objects. To maintain a high object detection speed, the most critical C3 modules are substituted with the proposed C3TB module for the purpose of striking a good balance between object detection accuracy and model lightweight. Extensive experiments conducted on two remote sensing datasets of NWPU VHR-10 and VisDrone 2019 demonstrates that our method achieves superior object detection performance than state-of-the-art methods.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported in part by the Young Backbone Teacher Training Program of Henan Province (No. 2023GGJS090), in part by the Scientific and Technological Research Project of Henan Provincial Department of Science and Technology under Grant [242102210013, 232102211048], and in part by the National Natural Science Foundation of P. R. China under Grant [61502435].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 689.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.