An Animal Pose Estimation Method for Handling Keypoint Occlusion
DOI:
https://doi.org/10.54691/hp63xf14Keywords:
Animal Pose Estimation, Keypoint Occlusion, Data Augmentation, Dynamic Upsampling.Abstract
Animal pose estimation plays a vital role in various fields such as animal behavior analysis and wildlife conservation. However, in real-world scenarios, keypoint occlusion frequently occurs, which limits the model’s ability to accurately localize keypoints. To address this issue, this paper proposes an occlusion-aware animal pose estimation algorithm based on an improved HRNet. The proposed method incorporates a data augmentation strategy driven by strongly connected keypoints to enrich the diversity of occluded keypoints in the training set and enhance the model’s inference capability for invisible keypoints. In addition, a dynamic upsampling module integrating both channel and spatial attention mechanisms is designed to improve the restoration quality of fine-grained features during the upsampling process. Furthermore, a progressive feature fusion strategy is introduced to reduce the information loss caused by large-scale upsampling in multi-scale feature integration, thereby further enhancing the fusion performance. Experimental results on the public AP-10K dataset demonstrate that the proposed method significantly outperforms the original HRNet and other comparison algorithms in terms of accuracy.
Downloads
References
[1] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2019: 5693-5703.
[2] Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 7291-7299.
[3] Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking[C]// Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018: 466-481.
[4] Mathis A, Mamidanna P, Cury K M, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning[J]. Nature Neuroscience, 2018, 21(9): 1281–1289.
[5] Zeng A, Sun X, Yang L, et al. Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 2282-2291.
[6] ZHAO Chenyang, WANG Yizhou, QIAO Yu, et al. Graph-PCNN: Two stage human pose estimation with graph pose refinement[C]// Proceedings of the 16th European Conference on Computer Vision (ECCV 2020). Glasgow: Springer, 2020: 492-508.
[7] DeVries T, Taylor G W. Improved regularization of convolutional neural networks with Cutout[C]// Proceedings of the 2017 NIPS Workshop. Long Beach: NIPS, 2017.
[8] Zhong Z, Zheng L, Kang G, et al. Random Erasing Data Augmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2020, 34(07): 13001-13008.
[9] LIU Y, ZHANG J, WANG X, et al. Masked feature completion for occlusion-aware human pose estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022: 12345-12355.
[10] Fieraru M, Khoreva A, Pishchulin L, et al. Learning to refine human pose estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 205-214.
[11] NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 483-499.
[12] DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution[C]// Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer, 2014: 184-199.
[13] Xia Z H, Wang Y J, He S C, et al. DySample: Dynamic Sampling for Efficient Upsampling[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022: 12833–12843.
[14] YANG J, LI C, ZHANG P, et al. AP-10K: A benchmark for animal pose estimation in the wild[C]// Advances in Neural Information Processing Systems. 2021: 1-12.
[15] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]// European Conference on Computer Vision (ECCV). Zurich: Springer, 2014: 740–755.
[16] Liu Yujian, Wang Kaipeng, Zhang Xiangxiang, et al. CSPNeXt: Hierarchical Split-and-Aggregate MLP for Lightweight Object Detection and Segmentation[J]. arXiv preprint arXiv:2207.09462, 2022.
[17] Zhang F, Zhu X, Dai Y, et al. Distribution-Aware Coordinate Representation for Human Pose Estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 7093–7102.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Frontiers in Science and Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






