"clip model" Papers

19 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248

citations

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Jihoon Kwon, Kyle Min, Jy-yong Sohn

NEURIPS 2025arXiv:2510.16540

NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection

Amirhossein Ansari, Ke Wang, Pulei Xiong

ICCV 2025arXiv:2507.09795

Position-Aware Guided Point Cloud Completion with CLIP Model

Feng Zhou, Qi Zhang, Ju Dai et al.

AAAI 2025paperarXiv:2412.08271

citations

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025arXiv:2504.11195

citations

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang et al.

ICLR 2025arXiv:2408.10202

citations

Adversarial Robustification via Text-to-Image Diffusion Models

Daewon Choi, Jongheon Jeong, Huiwon Jang et al.

ECCV 2024arXiv:2407.18658

citations

Attention Prompting on Image for Large Vision-Language Models

Runpeng Yu, Weihao Yu, Xinchao Wang

ECCV 2024arXiv:2409.17143

citations

CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment

Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok et al.

CVPR 2024arXiv:2404.01123

citations

Data-Free Generalized Zero-Shot Learning

Bowen Tang, Jing Zhang, Yan Long et al.

AAAI 2024paperarXiv:2401.15657

citations

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024arXiv:2407.08268

citations

Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning

Shangchao Su, Mingzhao Yang, Bin Li et al.

AAAI 2024paperarXiv:2211.07864

citations

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

Dongmei Zhang, Chang Li, Renrui Zhang et al.

AAAI 2024paperarXiv:2312.14465

citations

Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition

Kyle Buettner, Sina Malakouti, Xiang Li et al.

CVPR 2024arXiv:2401.01482

citations

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024paperarXiv:2312.08212

citations

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Christian Schlarmann, Naman Singh, Francesco Croce et al.

ICML 2024arXiv:2402.12336

citations

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Heng Wang, Jianbo Ma, Santiago Pascual et al.

AAAI 2024paperarXiv:2308.09300

citations

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024arXiv:2407.12276

citations

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024arXiv:2406.02915

citations