Oral "video question answering" Papers
9 papers found
Conference
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.
NEURIPS 2025oralarXiv:2505.18079
19
citations
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.
NEURIPS 2025oralarXiv:2510.26027
1
citations
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe et al.
NEURIPS 2025oralarXiv:2506.07016
5
citations
OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data
Hao Luo, Zihao Yue, Wanpeng Zhang et al.
NEURIPS 2025oral
ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks
Philip Schroeder, Ondrej Biza, Thomas Weng et al.
NEURIPS 2025oralarXiv:2508.01943
Seeing the Arrow of Time in Large Multimodal Models
Zihui (Sherry) Xue, Romy Luo, Kristen Grauman
NEURIPS 2025oralarXiv:2506.03340
6
citations
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin, Jie Zhang, Zhenyu Huang et al.
ICLR 2024oralarXiv:2401.16702
38
citations
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei, Shengqiong Wu, Wei Ji et al.
ICML 2024oralarXiv:2501.03230
146
citations
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Guangzhi Sun, Wenyi Yu, Changli Tang et al.
ICML 2024oralarXiv:2406.15704
76
citations