"contrastive language-image pretraining" Papers

21 papers found

AmorLIP: Efficient Language-Image Pretraining via Amortization

Haotian Sun, Yitong Li, Yuchen Zhuang et al.

NEURIPS 2025arXiv:2505.18983
2
citations

Attribute-based Visual Reprogramming for Vision-Language Models

Chengyi Cai, Zesheng Ye, Lei Feng et al.

ICLR 2025arXiv:2501.13982
5
citations

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Jiaqing Zhang, Mingxiang Cao, Xue Yang et al.

AAAI 2025paperarXiv:2412.07119
2
citations

Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation

Aishik Konwer, Zhijian Yang, Erhan Bas et al.

CVPR 2025arXiv:2503.04639
8
citations

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Pengfei Zhao, Rongbo Luan, Wei Zhang et al.

NEURIPS 2025arXiv:2506.06970
1
citations

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

Jingyi Yang, Zitong YU, Nixiuming et al.

ICLR 2025oralarXiv:2502.03549
3
citations

Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability

Zhiyu Zhu, Zhibo Jin, Jiayu Zhang et al.

ICLR 2025arXiv:2502.14889
3
citations

ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models

Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.

CVPR 2025arXiv:2501.11175
8
citations

Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation

Yuheng Feng, Changsong Wen, Zelin Peng et al.

CVPR 2025

Scaling Language-Free Visual Representation Learning

David Fan, Shengbang Tong, Jiachen Zhu et al.

ICCV 2025highlightarXiv:2504.01017
41
citations

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Damien Teney et al.

ICLR 2025arXiv:2405.17139
1
citations

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

Yayuan Li, Jintao Guo, Lei Qi et al.

AAAI 2025paperarXiv:2412.11375
5
citations

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025arXiv:2501.09425
38
citations

Anchor-based Robust Finetuning of Vision-Language Models

Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.

CVPR 2024arXiv:2404.06244
10
citations

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

Yi Zhang, Ce Zhang, Ke Yu et al.

AAAI 2024paperarXiv:2401.07457
34
citations

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao, Zhun Zhong, Zhanke Zhou et al.

ICML 2024arXiv:2406.00806
28
citations

Expediting Contrastive Language-Image Pretraining via Self-Distilled Encoders

Bumsoo Kim, Jinhyung Kim, Yeonsik Jo et al.

AAAI 2024paperarXiv:2312.12659
5
citations

MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

Yu Zhang, Qi Zhang, Zixuan Gong et al.

ICML 2024arXiv:2406.01460
7
citations

MoDE: CLIP Data Experts via Clustering

Jiawei Ma, Po-Yao Huang, Saining Xie et al.

CVPR 2024arXiv:2404.16030
25
citations

OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport

Liangliang Shi, Jack Fan, Junchi Yan

ICML 2024

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

Hao Cheng, Erjia Xiao, Jindong Gu et al.

ECCV 2024arXiv:2402.19150
15
citations