"video-language models" Papers

16 papers found

Can Text-to-Video Generation help Video-Language Alignment?

Luca Zanella, Massimiliano Mancini, Willi Menapace et al.

CVPR 2025arXiv:2503.18507
1
citations

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025arXiv:2412.09530
15
citations

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025arXiv:2408.00672
11
citations

Factorized Learning for Temporally Grounded Video-Language Models

Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.

ICCV 2025arXiv:2512.24097

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025
10
citations

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025arXiv:2504.07745
11
citations

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.

ICLR 2025arXiv:2501.13468
30
citations

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NEURIPS 2025spotlightarXiv:2504.15376
28
citations

Two Causally Related Needles in a Video Haystack

Miaoyu Li, Qin Chao, Boyang Li

NEURIPS 2025arXiv:2505.19853

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025arXiv:2409.01071
4
citations

World Model on Million-Length Video And Language With Blockwise RingAttention

Hao Liu, Wilson Yan, Matei Zaharia et al.

ICLR 2025oralarXiv:2402.08268
149
citations

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang, Minghao Chen, Qibo Qiu et al.

ECCV 2024arXiv:2407.14872
5
citations

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024arXiv:2312.06505
48
citations

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Yang Liu, Pengxiang Ding, Siteng Huang et al.

ECCV 2024arXiv:2409.07239
9
citations

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024arXiv:2402.13250
85
citations

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024arXiv:2311.17404
49
citations