"video comprehension" Papers
7 papers found
Conference
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge et al.
CVPR 2025arXiv:2412.04432
9
citations
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
NEURIPS 2025oralarXiv:2505.21334
20
citations
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
ICCV 2025arXiv:2506.22139
23
citations
Seeing the Arrow of Time in Large Multimodal Models
Zihui (Sherry) Xue, Romy Luo, Kristen Grauman
NEURIPS 2025oralarXiv:2506.03340
6
citations
Temporal Reasoning Transfer from Text to Video
Lei Li, Yuanxin Liu, Linli Yao et al.
ICLR 2025oralarXiv:2410.06166
21
citations
Unhackable Temporal Reward for Scalable Video MLLMs
En Yu, Kangheng Lin, Liang Zhao et al.
ICLR 2025oralarXiv:2502.12081
22
citations
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
Zixuan Xiong, Guangwei Xu, wenkai zhang et al.
ICLR 2025