Oral "video question answering" Papers

9 papers found

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.

NEURIPS 2025oralarXiv:2505.18079
19
citations

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders

Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.

NEURIPS 2025oralarXiv:2510.26027
1
citations

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe et al.

NEURIPS 2025oralarXiv:2506.07016
5
citations

OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data

Hao Luo, Zihao Yue, Wanpeng Zhang et al.

NEURIPS 2025oral

ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks

Philip Schroeder, Ondrej Biza, Thomas Weng et al.

NEURIPS 2025oralarXiv:2508.01943

Seeing the Arrow of Time in Large Multimodal Models

Zihui (Sherry) Xue, Romy Luo, Kristen Grauman

NEURIPS 2025oralarXiv:2506.03340
6
citations

Multi-granularity Correspondence Learning from Long-term Noisy Videos

Yijie Lin, Jie Zhang, Zhenyu Huang et al.

ICLR 2024oralarXiv:2401.16702
38
citations

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

Hao Fei, Shengqiong Wu, Wei Ji et al.

ICML 2024oralarXiv:2501.03230
146
citations

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Guangzhi Sun, Wenyi Yu, Changli Tang et al.

ICML 2024oralarXiv:2406.15704
76
citations