"vision language models" Papers

69 papers found • Page 2 of 2

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025arXiv:2503.23368
17
citations

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025arXiv:2508.02095
16
citations

Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

Sua Lee, Kyubum Shin, Jung Ho Park

ICLR 2025arXiv:2507.07147
1
citations

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

José Pombal, Nuno M Guerreiro, Ricardo Rei et al.

COLM 2025paperarXiv:2504.01001
8
citations

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Kaiyu Yue, Vasu Singla, Menglin Jia et al.

ICCV 2025arXiv:2505.22664
1
citations

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
192
citations

Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs

Jeongkee Lim, Yusung Kim

ECCV 2024arXiv:2408.02261
5
citations

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
264
citations

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

Jin Wang, Shichao Dong, Yapeng Zhu et al.

ICML 2024arXiv:2405.17201
5
citations

LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

Jia Shi, Gautam Rajendrakumar Gare, Jinjin Tian et al.

ICML 2024

Leveraging VLM-Based Pipelines to Annotate 3D Objects

Rishabh Kabra, Loic Matthey, Alexander Lerchner et al.

ICML 2024arXiv:2311.17851
9
citations

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.

ECCV 2024arXiv:2312.03766
17
citations

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024arXiv:2402.07872
188
citations

ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open Vocabulary Object Detection

Joonhyun Jeong, Geondo Park, Jayeon Yoo et al.

AAAI 2024paperarXiv:2312.07266
16
citations

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Yufei Wang, Zhanyi Sun, Jesse Zhang et al.

ICML 2024arXiv:2402.03681
119
citations

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Tongtian Yue, Jie Cheng, Longteng Guo et al.

CVPR 2024arXiv:2403.13263
13
citations

Soft Prompt Generation for Domain Generalization

Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.

ECCV 2024arXiv:2404.19286
30
citations

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma et al.

ECCV 2024arXiv:2409.19232
25
citations

ViG-Bias: Visually Grounded Bias Discovery and Mitigation

Badr-Eddine Marani, Mohamed HANINI, Nihitha Malayarukil et al.

ECCV 2024arXiv:2407.01996
2
citations