An Animal Pose Estimation Method for Handling Keypoint Occlusion

Authors

  • Xiaoying Zhu
  • Yaning Jiang

DOI:

https://doi.org/10.54691/hp63xf14

Keywords:

Animal Pose Estimation, Keypoint Occlusion, Data Augmentation, Dynamic Upsampling.

Abstract

Animal pose estimation plays a vital role in various fields such as animal behavior analysis and wildlife conservation. However, in real-world scenarios, keypoint occlusion frequently occurs, which limits the model’s ability to accurately localize keypoints. To address this issue, this paper proposes an occlusion-aware animal pose estimation algorithm based on an improved HRNet. The proposed method incorporates a data augmentation strategy driven by strongly connected keypoints to enrich the diversity of occluded keypoints in the training set and enhance the model’s inference capability for invisible keypoints. In addition, a dynamic upsampling module integrating both channel and spatial attention mechanisms is designed to improve the restoration quality of fine-grained features during the upsampling process. Furthermore, a progressive feature fusion strategy is introduced to reduce the information loss caused by large-scale upsampling in multi-scale feature integration, thereby further enhancing the fusion performance. Experimental results on the public AP-10K dataset demonstrate that the proposed method significantly outperforms the original HRNet and other comparison algorithms in terms of accuracy.

Downloads

Download data is not yet available.

References

[1] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2019: 5693-5703.

[2] Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 7291-7299.

[3] Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking[C]// Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018: 466-481.

[4] Mathis A, Mamidanna P, Cury K M, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning[J]. Nature Neuroscience, 2018, 21(9): 1281–1289.

[5] Zeng A, Sun X, Yang L, et al. Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 2282-2291.

[6] ZHAO Chenyang, WANG Yizhou, QIAO Yu, et al. Graph-PCNN: Two stage human pose estimation with graph pose refinement[C]// Proceedings of the 16th European Conference on Computer Vision (ECCV 2020). Glasgow: Springer, 2020: 492-508.

[7] DeVries T, Taylor G W. Improved regularization of convolutional neural networks with Cutout[C]// Proceedings of the 2017 NIPS Workshop. Long Beach: NIPS, 2017.

[8] Zhong Z, Zheng L, Kang G, et al. Random Erasing Data Augmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2020, 34(07): 13001-13008.

[9] LIU Y, ZHANG J, WANG X, et al. Masked feature completion for occlusion-aware human pose estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022: 12345-12355.

[10] Fieraru M, Khoreva A, Pishchulin L, et al. Learning to refine human pose estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 205-214.

[11] NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 483-499.

[12] DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution[C]// Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer, 2014: 184-199.

[13] Xia Z H, Wang Y J, He S C, et al. DySample: Dynamic Sampling for Efficient Upsampling[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022: 12833–12843.

[14] YANG J, LI C, ZHANG P, et al. AP-10K: A benchmark for animal pose estimation in the wild[C]// Advances in Neural Information Processing Systems. 2021: 1-12.

[15] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]// European Conference on Computer Vision (ECCV). Zurich: Springer, 2014: 740–755.

[16] Liu Yujian, Wang Kaipeng, Zhang Xiangxiang, et al. CSPNeXt: Hierarchical Split-and-Aggregate MLP for Lightweight Object Detection and Segmentation[J]. arXiv preprint arXiv:2207.09462, 2022.

[17] Zhang F, Zhu X, Dai Y, et al. Distribution-Aware Coordinate Representation for Human Pose Estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 7093–7102.

Downloads

Published

24-11-2025

Issue

Section

Articles