"large vision-language models" Papers
17 papers found
Conference
Compress & Cache: Vision token compression for efficient generation and retrieval
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu, Liang Pang, Yunchang Zhu et al.
DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?
Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
Jie Zhang, Zhongqi Wang, Mengqi Lei et al.
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
Seongyun Lee, Geewook Kim, Jiyeon Kim et al.
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun, Hang Hua, Jian Wang et al.
LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification
Zhiqi Pang, Lingling Zhao, Junjie Wang et al.
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu, Zekun Li, Pei Li et al.
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning
Cheng Chen, Yunpeng Zhai, Yifan Zhao et al.
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Xianzhe Fan, Xuhui Zhou, Chuanyang Jin et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs
Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu, Yue Wu, Meng Chu et al.
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu et al.