MSG-DETR: A Small Object Detection Algorithm for UAV Aerial Images

Authors

  • Xiang Li
  • Ruxin Gao

DOI:

https://doi.org/10.54691/g0xe4t97

Keywords:

UAV object detection; small object detection; feature fusion; attention mechanism.

Abstract

Due to the significant scale variations and dense distribution of small objects commonly present in imagery captured by unmanned aerial vehicles (UAV), traditional object detection algorithms often suffer from missed detections and false positives in such scenarios. To address these challenges, we propose a novel detection framework—MSG-DETR, specifically designed for small object detection in aerial images captured by UAV. First, we design a lightweight multi-scale feature fusion backbone, MSFFNet, which enhances the extraction of small object features while significantly reducing computational overhead. Second, we introduce the Small-object Feature Fusion (SF-Fusion), which incorporates rich P3-level features from MSFFNet into the neck architecture to deepen feature fusion and mitigate information loss. Finally, we integrate a Gated Convolutional Attention Mechanism (GCAM) to improve the model’s ability to perceive and localize tiny objects in cluttered backgrounds. Experimental results on the VisDrone2019 dataset demonstrate that MSG-DETR achieves performance gains of +3.8% in mAP@0.5 and +3.5% in mAP@0.5:0.95, while reducing the number of parameters by 28.1% compared to the baseline model.

Downloads

Download data is not yet available.

References

[1] Feng J, Wang J, Qin R. Lightweight detection network for arbitrary-oriented vehicles in UAV imagery via precise positional information encoding and bidirectional feature fusion[J]. International Journal of Remote Sensing, 2023, 44(15): 4529-4558.

[2] Qu Y, Sun H, Dong C, et al. Elastic collaborative edge intelligence for UAV swarm: Architecture, challenges, and opportunities[J]. IEEE Communications Magazine, 2023, 62(1): 62-68.

[3] Wang X, Demartino C, Narazaki Y, et al. Rapid seismic risk assessment of bridges using UAV aerial photogrammetry[J]. Engineering Structures, 2023, 279: 115589.

[4] Du B, Huang Y, Chen J, et al. Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 13435-13444.

[5] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.

[6] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

[7] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Cham: Springer International Publishing, 2016: 21-37.

[8] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.

[9] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.

[10] Feng C, Zhong Y, Gao Y, et al. Tood: Task-aligned one-stage object detection[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 2021: 3490-3499.

[11] Zhu X, Su W, Lu L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020.

[12] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[13] Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781-10790.

[14] Xiao Y, Di N. SOD-YOLO: A lightweight small object detection framework[J]. Scientific Reports, 2024, 14(1): 25624.

[15] Nguyen H H, Hoang M S. LEAF-YOLO: Lightweight Edge-Real-Time Small Object Detection on Aerial Imagery[J]. Intelligent Systems with Applications, 2025, 25: 200484.

[16] Yin Y, Yu J, Chen P, et al. Road crack detection of drone-captured images based on TPH-YOLOv5[J]. International Journal of Pavement Engineering, 2025, 26(1): 2474729.

[17] Zhang J, Xia K, Huang Z, et al. ETAM: Ensemble transformer with attention modules for detection of small objects[J]. Expert systems with applications, 2023, 224: 119997.

[18] Dai L, Liu H, Tang H, et al. AO2-DETR: Arbitrary-oriented object detection transformer[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33(5): 2342-2356.

[19] Ren K, Gao Y, Wan M, et al. Infrared small target detection via region super resolution generative adversarial network[J]. Applied Intelligence, 2022, 52(10): 11725-11737.

[20] Wang S, Jiang H, Yang J, et al. Amfef-detr: An end-to-end adaptive multi-scale feature extraction and fusion object detection network based on uav aerial images[J]. Drones, 2024, 8(10): 523.

[21] Zhang Y, Jia R S, Yang R, et al. DSNet: A vehicle density estimation network based on multi-scale sensing of vehicle density in video images[J]. Expert Systems with Applications, 2023, 234: 121020.

[22] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.

[23] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.

[24] Du D, Zhu P, Wen L, et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF international conference on computer vision workshops. 2019: 0-0.

[25] Tian Y, Ye Q, Doermann D. Yolov12: Attention-centric real-time object detectors[J]. arXiv preprint arXiv:2502.12524, 2025.

[26] Hu Y, Zhou Y, Xiao J, et al. GFL: A decentralized federated learning framework based on blockchain[J]. arXiv preprint arXiv:2010.10996, 2020.

[27] Zhang H, Li F, Liu S, et al. Dino: Detr with improved denoising anchor boxes for end-to-end object detection[J]. arXiv preprint arXiv:2203.03605, 2022.

[28] Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162.

Downloads

Published

19-05-2026

Issue

Section

Articles