Poster "long video understanding" Papers

15 papers found

Adaptive Keyframe Sampling for Long Video Understanding

Xi Tang, Jihao Qiu, Lingxi Xie et al.

CVPR 2025arXiv:2502.21271
73
citations

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding

Xue zhucun, Jiangning Zhang, Xie Xurong et al.

NEURIPS 2025arXiv:2506.13589
7
citations

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Yiwu Zhong, Zhuoming Liu, Yin Li et al.

ICCV 2025arXiv:2412.03248
24
citations

Bringing RNNs Back to Efficient Open-Ended Video Understanding

Weili Xu, Enxin Song, Wenhao Chai et al.

ICCV 2025arXiv:2507.02591
8
citations

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Guo Chen, Yicheng Liu, Yifei Huang et al.

ICLR 2025arXiv:2412.12075
43
citations

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025arXiv:2406.12846
39
citations

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Boyu Chen, Zhengrong Yue, Siran Chen et al.

ICCV 2025arXiv:2503.10200
22
citations

MLVU: Benchmarking Multi-task Long Video Understanding

Junjie Zhou, Yan Shu, Bo Zhao et al.

CVPR 2025arXiv:2406.04264
105
citations

MR. Video: MapReduce as an Effective Principle for Long Video Understanding

Ziqi Pang, Yu-Xiong Wang

NEURIPS 2025

SEAL: Semantic Attention Learning for Long Video Representation

Lan Wang, Yujia Chen, Wen-Sheng Chu et al.

CVPR 2025arXiv:2412.01798
7
citations

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025arXiv:2409.01071
4
citations

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Yan Shu, Zheng Liu, Peitian Zhang et al.

CVPR 2025arXiv:2409.14485
155
citations

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024arXiv:2312.07395
41
citations

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang et al.

CVPR 2024arXiv:2307.16449
471
citations

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren, Linli Yao, Shicheng Li et al.

CVPR 2024arXiv:2312.02051
372
citations