Research on Feature Fusion Image Defogging Algorithms based on Transformers
DOI:
https://doi.org/10.54691/44ep1a26Keywords:
Computer Vision, Image Defogging, Transformer, Feature Fusion.Abstract
Under hazy conditions, airborne particulate matter and chemical substances reduce image contrast and blur details by absorbing and scattering light, severely compromising the accuracy of computer vision tasks such as autonomous driving and remote sensing monitoring. Addressing the limitations of traditional physical defogging models prone to estimation errors, the insufficient global feature modeling capabilities of mainstream convolutional neural networks (CNNs), and the high computational complexity and inadequate local detail capture of existing Transformer-based methods, this paper proposes a Transformer-based feature fusion image defogging algorithm. This algorithm employs a U-shaped encoder-decoder as its backbone network. It introduces a Hub-and-Spoke Multi-Head Attention (HSMHA) mechanism to replace traditional self-attention, significantly reducing computational overhead while preserving global context modeling capabilities. A Feature Refinement Block (FRB) is embedded within the feedforward neural network to enhance the recovery of image texture and detail information. A Multiscale Residual Enhancer (MRE) is constructed to effectively eliminate redundant high-frequency features and deepen learning of subtle feature variations. A contrastive regularization (CR) learning strategy is introduced, using blurred images as negative samples and clear images as positive samples to guide the model toward learning more discriminative feature representations, thereby enhancing the consistency between defogged images and their original clear counterparts. Experimental results on the SOTS-indoor and SOTS-outdoor synthetic datasets demonstrate that the proposed algorithm achieves optimal or suboptimal levels in both Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM). Specifically, on the SOTS-indoor dataset, PSNR reaches 35.79dB and SSIM attains 0.984, and on the SOTS-outdoor dataset, PSNR was 34.31 dB and SSIM is 0.987. Qualitative results demonstrate the algorithm's superior performance in restoring image color fidelity, contrast, and detail integrity, effectively addressing issues such as incomplete defogging, color bias, or edge blurring present in existing methods.
Downloads
References
[1] Wang Y, Bi X. The Application of Computer Vision Target Recognition Technology in Autonomous Driving [C]. 2024 3rd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS).IEEE,2024:519-524.
[2] Baker D J, Halvorson H. of Modern Physics[J]. Studies in History and Philosophy of Modern Physics, 2013, 44: 464-469.
[3] Burchard W. Light scattering techniques[M]//Physical techniques for the study of food biopolymers. Boston, MA: Springer US, 1994: 151-213.
[4] Cox L. Optics of the atmosphere-scattering by molecules and particles [J]. Optica Acta: International Journal of Optics, 1977, 24(7): 779-779.
[5] He K, Sun J, Tang X. Single image haze removal using dark channel prior[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 33(12): 2341-2353.
[6] Wu Zifan, Luo Weiping Improved algorithm for image dehazing based on dark channels [J]. Journal of Wuhan Textile University, 2023, 36 (05): 47-52
[7] Li S, Liu R, Fan X, et al. Single Image Dehazing via Adaptive Transmission Optimization with Deep Prior[C]//2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). IEEE, 2018: 1-5.
[8] Shi S, Zhang Y, Zhou X, et al. Cloud removal for single visible image based on modified dark channel prior with multiple scale[C]//2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 2021: 4127-4130.
[9] Wang Q, Zhao L, Tang G, et al. Single-image dehazing using color attenuation prior based on haze-lines[C]//2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019: 5080-5087.
[10] Wang K, Yang Y, Li B, et al. Uneven Image Dehazing by Heterogeneous Twin Network[J]. IEEE access, 2020, 8: 118485-118496.
[11] Ren W, Liu S, Zhang H, et al. Single image dehazing via multi-scale convolutional neural networks[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer International Publishing, 2016: 154-169.
[12] Guo C L, Yan Q, Anwar S, et al. Image dehazing transformer with transmission-aware 3d position embedding[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 5812-5820.
[13] Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 558-567.
[14] Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 568-578.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Frontiers in Science and Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






