"large vision-language models" Papers

17 papers found

Compress & Cache: Vision token compression for efficient generation and retrieval

Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

NEURIPS 2025

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

Shicheng Xu, Liang Pang, Yunchang Zhu et al.

ICLR 2025arXiv:2410.12662
14
citations

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.

ICLR 2025arXiv:2409.07703
65
citations

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Jie Zhang, Zhongqi Wang, Mengqi Lei et al.

ICLR 2025arXiv:2406.18849
3
citations

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

Seongyun Lee, Geewook Kim, Jiyeon Kim et al.

ICLR 2025arXiv:2410.07571
4
citations

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.

CVPR 2025arXiv:2411.15268
14
citations

Latent Chain-of-Thought for Visual Reasoning

Guohao Sun, Hang Hua, Jian Wang et al.

NEURIPS 2025arXiv:2510.23925
13
citations

LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification

Zhiqi Pang, Lingling Zhao, Junjie Wang et al.

NEURIPS 2025

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Xuannan Liu, Zekun Li, Pei Li et al.

ICLR 2025arXiv:2406.08772
49
citations

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning

Cheng Chen, Yunpeng Zhai, Yifan Zhao et al.

CVPR 2025arXiv:2506.09473
2
citations

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

Xianzhe Fan, Xuhui Zhou, Chuanyang Jin et al.

NEURIPS 2025arXiv:2506.23046
5
citations

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation

Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.

CVPR 2025arXiv:2412.03178
5
citations

Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs

Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.

NEURIPS 2025
1
citations

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025arXiv:2412.04378
11
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025arXiv:2506.10857
9
citations

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.

ECCV 2024arXiv:2403.12884
26
citations

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

Hao Cheng, Erjia Xiao, Jindong Gu et al.

ECCV 2024arXiv:2402.19150
15
citations