"vision-language models" Papers

570 papers found • Page 9 of 12

Filters:vision-language models Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025arXiv:2412.04378

citations

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025arXiv:2511.06256

citations

VLMaterial: Procedural Material Generation with Large Vision-Language Models

Beichen Li, Rundi Wu, Armando Solar-Lezama et al.

ICLR 2025arXiv:2501.18623

citations

VLMs can Aggregate Scattered Training Patches

Zhanhui Zhou, Lingjie Chen, Chao Yang et al.

NEURIPS 2025arXiv:2506.03614

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837

183

citations

Vocabulary-Guided Gait Recognition

Panjian Huang, Saihui Hou, Chunshui Cao et al.

NEURIPS 2025

VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning

Wenhao Li, Qiangchang Wang, Xianjing Meng et al.

NEURIPS 2025arXiv:2509.25033

citations

Weakly-Supervised Learning of Dense Functional Correspondences

Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.

ICCV 2025arXiv:2509.03893

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025arXiv:2503.13652

citations

What Makes a Maze Look Like a Maze?

Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.

ICLR 2025arXiv:2409.08202

citations

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Junwei Luo, Yingying Zhang, Xue Yang et al.

ICCV 2025arXiv:2503.07588

citations

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

Yangfu Li, Hongjian Zhan, Tianyi Chen et al.

NEURIPS 2025arXiv:2505.10118

citations

Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context

Ge Zheng, Jiaye Qian, Jiajin Tang et al.

ICCV 2025arXiv:2510.20229

citations

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025arXiv:2503.02199

citations

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287

citations

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.

CVPR 2024arXiv:2312.12730

citations

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang, Minghao Chen, Qibo Qiu et al.

ECCV 2024arXiv:2407.14872

citations

Adaptive Multi-task Learning for Few-shot Object Detection

Yan Ren, Yanling Li, Wai-Kin Adams Kong

ECCV 2024

citations

Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

MENGYU ZHENG, Yehui Tang, Zhiwei Hao et al.

ECCV 2024

citations

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang et al.

ECCV 2024arXiv:2311.11261

citations

Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models

Jie ZHANG, Xiaosong Ma, Song Guo et al.

ICML 2024

A Multimodal Automated Interpretability Agent

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.

ICML 2024arXiv:2404.14394

citations

Anchor-based Robust Finetuning of Vision-Language Models

Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.

CVPR 2024arXiv:2404.06244

citations

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Weijie Tu, Weijian Deng, Dylan Campbell et al.

ICML 2024arXiv:2402.07417

citations

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024arXiv:2403.06764

368

citations

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024paperarXiv:2308.15366

252

citations

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

William Zhu, Keren Ye, Junjie Ke et al.

ECCV 2024arXiv:2408.04102

citations

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Kailas Vodrahalli, James Zou

ICML 2024arXiv:2306.08141

citations

A Touch, Vision, and Language Dataset for Multimodal Alignment

Letian Fu, Gaurav Datta, Huang Huang et al.

ICML 2024arXiv:2402.13232

citations

Attention Prompting on Image for Large Vision-Language Models

Runpeng Yu, Weihao Yu, Xinchao Wang

ECCV 2024arXiv:2409.17143

citations

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

Ye-Bin Moon, Nam Hyeon-Woo, Wonseok Choi et al.

ECCV 2024arXiv:2407.13442

citations

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

Zhihe Lu, Jiawang Bai, Xin Li et al.

ICML 2024arXiv:2311.17091

citations

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Ian Huang, Guandao Yang, Leonidas Guibas

ECCV 2024arXiv:2404.17672

citations

Bootstrapping Variational Information Pursuit with Large Language and Vision Models for Interpretable Image Classification

Aditya Chattopadhyay, Kwan Ho Ryan Chan, Rene Vidal

ICLR 2024

Bridging Environments and Language with Rendering Functions and Vision-Language Models

Théo Cachet, Christopher Dance, Olivier Sigaud

ICML 2024

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Jiahan Zhang, Qi Wei, Feng Liu et al.

ICML 2024arXiv:2406.10502

citations

Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li et al.

CVPR 2024highlightarXiv:2309.01327

113

citations

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.

ICML 2024arXiv:2406.00670

citations

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu, Shuai Xue, Gan Jianwen et al.

CVPR 2024highlightarXiv:2403.14333

citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024arXiv:2311.16445

citations

CLIM: Contrastive Language-Image Mosaic for Region Representation

Size Wu, Wenwei Zhang, Lumin XU et al.

AAAI 2024paperarXiv:2312.11376

citations

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.

ICML 2024spotlightarXiv:2402.04764

citations

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

Siyu Jiao, hongguang Zhu, Yunchao Wei et al.

ECCV 2024arXiv:2408.00744

citations

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268

citations

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan, Jun Li, Yizhuang Zhou et al.

AAAI 2024paperarXiv:2312.06401

citations

Conceptual Codebook Learning for Vision-Language Models

Yi Zhang, Ke Yu, Siqi Wu et al.

ECCV 2024arXiv:2407.02350

citations

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Zhengbo Wang, Jian Liang, Ran He et al.

ICML 2024arXiv:2402.04050

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

Zaid Khan, Yun Fu

CVPR 2024arXiv:2404.10193

citations

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024arXiv:2407.20337

citations

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024arXiv:2406.00345

citations

← Previous

1...7 8 9 10 11 12