Poster "long video understanding" Papers
15 papers found
Conference
Adaptive Keyframe Sampling for Long Video Understanding
Xi Tang, Jihao Qiu, Lingxi Xie et al.
CVPR 2025arXiv:2502.21271
73
citations
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
Xue zhucun, Jiangning Zhang, Xie Xurong et al.
NEURIPS 2025arXiv:2506.13589
7
citations
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong, Zhuoming Liu, Yin Li et al.
ICCV 2025arXiv:2412.03248
24
citations
Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu, Enxin Song, Wenhao Chai et al.
ICCV 2025arXiv:2507.02591
8
citations
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Guo Chen, Yicheng Liu, Yifei Huang et al.
ICLR 2025arXiv:2412.12075
43
citations
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma, Chenhui Gou, Hengcan Shi et al.
CVPR 2025arXiv:2406.12846
39
citations
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen, Zhengrong Yue, Siran Chen et al.
ICCV 2025arXiv:2503.10200
22
citations
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou, Yan Shu, Bo Zhao et al.
CVPR 2025arXiv:2406.04264
105
citations
MR. Video: MapReduce as an Effective Principle for Long Video Understanding
Ziqi Pang, Yu-Xiong Wang
NEURIPS 2025
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
CVPR 2025arXiv:2412.01798
7
citations
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
ICCV 2025arXiv:2409.01071
4
citations
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Zheng Liu, Peitian Zhang et al.
CVPR 2025arXiv:2409.14485
155
citations
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.
CVPR 2024arXiv:2312.07395
41
citations
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song, Wenhao Chai, Guanhong Wang et al.
CVPR 2024arXiv:2307.16449
471
citations
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren, Linli Yao, Shicheng Li et al.
CVPR 2024arXiv:2312.02051
372
citations