Keyframe Extraction Approaches for Videos under Multimodal Scenarios: A Survey

Authors

  • Chunlei Zhao
  • Ziqin Ye
  • Yuanyuan Cao
  • Zhan Zhang
  • Liwei Wu

DOI:

https://doi.org/10.54691/4mwh0j18

Keywords:

Keyframes, Multimodal, Attention Mechanism.

Abstract

Video keyframe extraction technology, as a core component of video summarization, retrieval, and content analysis, continues to receive widespread attention in the field of computer vision. With the rapid development of diverse applications such as smart cities, autonomous driving, and human-computer interaction, the sources and scenarios of video data have become increasingly complex, placing higher demands on keyframe extraction techniques. In dynamic scenes, factors such as intense environmental interference and highly variable motion of subjects often cause traditional extraction methods to face challenges like insufficient robustness and limited extraction accuracy. To address these issues, numerous intelligent extraction algorithms based on deep learning have emerged in recent years, significantly enhancing extraction performance in dynamic environments. By systematically reviewing keyframe extraction methods suitable for various dynamic scenes including surveillance, sports, and gesture analysis, and summarizing mainstream algorithmic models based on attention mechanisms, temporal modeling, and reinforcement learning, this paper analyzes their core concepts, advantages, and limitations. It concludes with experimental findings and offers personal insights, aiming to provide a clear reference framework and development direction for future research on efficient and robust video parsing technologies in dynamic scenes.

Downloads

Download data is not yet available.

References

[1] ZHANG Qiaoqiao. Research on Key Frame Extraction from Surveillance Videos Based on Spatiotemporal Graph Representation[D]. Anhui University, 2018.

[2] YUAN Ting. Intelligent Airport Security Video Surveillance System Based on Key Frame Extraction[J]. Information and Computer(Theoretical Edition), 2022, 34(24): 59-61.

[3] ZHANG Jiayu. Research on Key Frame Extraction from Surveillance Videos Based on Time-Frequency Domain Analysis[D]. Shijiazhuang Tiedao University, 2023. DOI:10.27334/d.cnki.gstdy.2023.000798.

[4] ZHOU Hanxing. Research and System Implementation of Key Frame Extraction Technology in Video Surveillance[D]. Chongqing University of Posts and Telecommunications, 2018. DOI:10.27675/d.cnki.gcydx.2018.000115.

[5] YU Yixuan, YANG Geng, GENG Hua. Multimodal Hierarchical Key Frame Extraction Method for Continuous Compound Motion[J]. Journal of Shandong University(Engineering Science), 2023, 53(02): 42-50.

[6] ZENG Z H, XIANG H, LIN Z C, et al. A keyframe extraction method for production line videos based on action semantics[J]. Manufacturing Technology & Machine Tool, 2025, (05): 172-180. DOI:10.19287/j.mtmt.1005-2402.2025.05.023.

[7] GAO X X, GU L. A sports video keyframe extraction model based on skeletal keypoint features[J]. Foreign Electronic Measurement Technology, 2022, 41(09): 88-94. DOI:10.19652/j.cnki.femt.2203974.

[8] CAI G L. Keyframe extraction for table tennis action video clips combining flexible pose estimation and spatiotemporal features[J]. Science Technology and Engineering, 2019, 19(25): 268-272.

[9] ZHANG H L. A keyframe extraction system for high-dynamic dance videos based on multi-feature fusion[J]. Techniques of Automation & Applications, 2022, 41(06): 91-94+116. DOI:10.20033/j.1003-7241.(2022)06-0091-05.

[10] JIA Donglin, ZHANG Jingjing, LI Quansheng, et al. Pose Recognition-Based Key Frame Extraction Method for Equine Videos[J]. Computer and Digital Engineering, 2025, 53(01): 263-268.

[11] ZHU Jianan. Dance Video Key Frame Extraction Algorithm Based on Gait Cycle Clustering[J]. Information Technology, 2025, (06): 101-106. DOI:10.13274/j.cnki.hdzj.2025.06.017.

Downloads

Published

24-12-2025

Issue

Section

Articles