Poster "video large language models" Papers

16 papers found

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

ICCV 2025arXiv:2506.07371
4
citations

Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models

Eunseop Yoon, Hee Suk Yoon, Mark Hasegawa-Johnson et al.

ICLR 2025arXiv:2507.04976
4
citations

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025arXiv:2506.07464
28
citations

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025arXiv:2501.03218
36
citations

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Joya Chen, Yiqi Lin, Ziyun Zeng et al.

CVPR 2025arXiv:2504.16030
4
citations

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025arXiv:2411.19772
34
citations

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.

NEURIPS 2025arXiv:2502.16671
8
citations

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.

ICCV 2025arXiv:2507.07990
14
citations

PAVE: Patching and Adapting Video Large Language Models

Zhuoming Liu, Yiquan Li, Khoi D Nguyen et al.

CVPR 2025arXiv:2503.19794
1
citations

PVChat: Personalized Video Chat with One-Shot Learning

YUFEI SHI, Weilong Yan, Gang Xu et al.

ICCV 2025arXiv:2503.17069
3
citations

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.

CVPR 2025arXiv:2411.16932
7
citations

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

Zhicheng Zhang, Weicheng Wang, Yongjie Zhu et al.

NEURIPS 2025arXiv:2511.02712
1
citations

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.

ICCV 2025arXiv:2411.16156
9
citations

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025arXiv:2501.00599
43
citations

ST-LLM: Large Language Models Are Effective Temporal Learners

Ruyang Liu, Chen Li, Haoran Tang et al.

ECCV 2024arXiv:2404.00308
129
citations

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu et al.

CVPR 2024arXiv:2406.11816
116
citations