Oral "video large language models" Papers
9 papers found
Conference
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.
NEURIPS 2025oralarXiv:2510.26027
1
citations
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen, Guoqiang Gong, Tao He et al.
NEURIPS 2025oralarXiv:2503.11187
16
citations
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Dongping Chen, Yue Huang, Siyuan Wu et al.
ICLR 2025oralarXiv:2406.10819
28
citations
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
NEURIPS 2025oralarXiv:2505.21334
20
citations
Improve Temporal Reasoning in Multimodal Large Language Models via Video Contrastive Decoding
Daiqing Qi, Dongliang Guo, Hanzhang Yuan et al.
NEURIPS 2025oral
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering
JIANFENG CAI, Jiale Hong, Zongmeng Zhang et al.
NEURIPS 2025oralarXiv:2505.12826
3
citations
Temporal Reasoning Transfer from Text to Video
Lei Li, Yuanxin Liu, Linli Yao et al.
ICLR 2025oralarXiv:2410.06166
21
citations
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Haichao Zhang, Yun Fu
NEURIPS 2025oralarXiv:2503.16980
3
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian, Juncheng Li, Yu Wu et al.
ICML 2024oralarXiv:2402.11435
104
citations