"vision language models" Papers
69 papers found • Page 2 of 2
Conference
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang et al.
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou, Alexander Vilesov, Xuehai He et al.
Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
Sua Lee, Kyubum Shin, Jung Ho Park
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José Pombal, Nuno M Guerreiro, Ricardo Rei et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
Jeongkee Lim, Yusung Kim
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang, Shichao Dong, Yapeng Zhu et al.
LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies
Jia Shi, Gautam Rajendrakumar Gare, Jinjin Tian et al.
Leveraging VLM-Based Pipelines to Annotate 3D Objects
Rishabh Kabra, Loic Matthey, Alexander Lerchner et al.
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open Vocabulary Object Detection
Joonhyun Jeong, Geondo Park, Jayeon Yoo et al.
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang, Zhanyi Sun, Jesse Zhang et al.
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue, Jie Cheng, Longteng Guo et al.
Soft Prompt Generation for Domain Generalization
Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma et al.
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani, Mohamed HANINI, Nihitha Malayarukil et al.