"visual language models" Papers

14 papers found

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.

COLM 2025paperarXiv:2508.09245

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

Xue Li, Yiyou Sun, Wei Cheng et al.

ICLR 2025
8
citations

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Jiaqing Zhang, Mingxiang Cao, Xue Yang et al.

AAAI 2025paperarXiv:2412.07119
2
citations

FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

Tongyuan Bai, Wangyuanfan Bai, Dong Chen et al.

CVPR 2025arXiv:2506.02781
4
citations

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Shuo Li, Tao Ji, Xiaoran Fan et al.

ICLR 2025arXiv:2410.11302
11
citations

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Jiajun Shi, Jian Yang, Jiaheng Liu et al.

NEURIPS 2025spotlightarXiv:2505.14552
4
citations

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NEURIPS 2025arXiv:2407.04842
60
citations

NL-Eye: Abductive NLI For Images

Mor Ventura, Michael Toker, Nitay Calderon et al.

ICLR 2025arXiv:2410.02613
3
citations

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.

CVPR 2025arXiv:2504.08966
21
citations

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.

NEURIPS 2025arXiv:2505.21497
25
citations

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025arXiv:2503.10225
3
citations

VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree

Wenlong Li, Yifei Xu, Yuan Rao et al.

NEURIPS 2025oralarXiv:2510.22693
1
citations

YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary

Hao-Tang Tsui, Chien-Yao Wang, Hong-Yuan Liao

ICLR 2025arXiv:2410.15346

Prompting Language-Informed Distribution for Compositional Zero-Shot Learning

Wentao Bao, Lichang Chen, Heng Huang et al.

ECCV 2024arXiv:2305.14428
35
citations