"long video understanding" Papers

23 papers found

Adaptive Keyframe Sampling for Long Video Understanding

Xi Tang, Jihao Qiu, Lingxi Xie et al.

CVPR 2025arXiv:2502.21271
73
citations

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding

Xue zhucun, Jiangning Zhang, Xie Xurong et al.

NEURIPS 2025arXiv:2506.13589
7
citations

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Yiwu Zhong, Zhuoming Liu, Yin Li et al.

ICCV 2025arXiv:2412.03248
24
citations

ALLVB: All-in-One Long Video Understanding Benchmark

Xichen Tan, Yuanjing Luo, Yunfan Ye et al.

AAAI 2025paperarXiv:2503.07298
6
citations

Bringing RNNs Back to Efficient Open-Ended Video Understanding

Weili Xu, Enxin Song, Wenhao Chai et al.

ICCV 2025arXiv:2507.02591
8
citations

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang et al.

ICLR 2025arXiv:2412.12075
43
citations

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025arXiv:2406.12846
39
citations

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.

NEURIPS 2025oralarXiv:2503.13139
18
citations

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Boyu Chen, Zhengrong Yue, Siran Chen et al.

ICCV 2025arXiv:2503.10200
22
citations

MLVU: Benchmarking Multi-task Long Video Understanding

Junjie Zhou, Yan Shu, Bo Zhao et al.

CVPR 2025arXiv:2406.04264
105
citations

MR. Video: MapReduce as an Effective Principle for Long Video Understanding

Ziqi Pang, Yu-Xiong Wang

NEURIPS 2025

One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding

Zheyu Zhang, Ziqi Pang, Shixing Chen et al.

NEURIPS 2025oral

SEAL: Semantic Attention Learning for Long Video Representation

Lan Wang, Yujia Chen, Wen-Sheng Chu et al.

CVPR 2025arXiv:2412.01798
7
citations

Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Jingyang Lin, Jialian Wu, Ximeng Sun et al.

NEURIPS 2025oralarXiv:2506.05332
10
citations

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

Xiaoqian Shen, Wenxuan Zhang, Jun Chen et al.

NEURIPS 2025oralarXiv:2510.14032
6
citations

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025arXiv:2409.01071
4
citations

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Jialong Zuo, Yongtai Deng, Lingdong Kong et al.

NEURIPS 2025oralarXiv:2510.12422
2
citations

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Yan Shu, Zheng Liu, Peitian Zhang et al.

CVPR 2025arXiv:2409.14485
155
citations

World Model on Million-Length Video And Language With Blockwise RingAttention

Hao Liu, Wilson Yan, Matei Zaharia et al.

ICLR 2025oralarXiv:2402.08268
149
citations

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024arXiv:2312.07395
41
citations

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346
64
citations

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang et al.

CVPR 2024arXiv:2307.16449
471
citations

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren, Linli Yao, Shicheng Li et al.

CVPR 2024arXiv:2312.02051
372
citations