"zero-shot classification" Papers

45 papers found

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

CVPR 2025arXiv:2411.18674
15
citations

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248
11
citations

Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis

Hanbin Ko, Chang Min Park

CVPR 2025arXiv:2505.22079
5
citations

Captured by Captions: On Memorization and its Mitigation in CLIP Models

Wenhao Wang, Adam Dziedzic, Grace Kim et al.

ICLR 2025arXiv:2502.07830
4
citations

CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation

Reza Abbasi, Ali Nazari, Aminreza Sefid et al.

CVPR 2025arXiv:2502.19842
6
citations

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci et al.

ICLR 2025arXiv:2502.04263
16
citations

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

Po-han Li, Sandeep Chinchali, ufuk topcu

ICLR 2025arXiv:2410.07610
5
citations

Diffusion Classifiers Understand Compositionality, but Conditions Apply

Yujin Jeong, Arnas Uselis, Seong Joon Oh et al.

NEURIPS 2025arXiv:2505.17955
3
citations

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025arXiv:2412.16334
46
citations

ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models

Duy M. H. Nguyen, Nghiem Diep, Trung Nguyen et al.

NEURIPS 2025arXiv:2410.02615
5
citations

Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions

Hubert Baniecki, Maximilian Muschalik, Fabian Fumagalli et al.

NEURIPS 2025arXiv:2508.05430

Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment

Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.

CVPR 2025arXiv:2409.19425
2
citations

Learning Shared Representations from Unpaired Data

Amitai Yacobi, Nir Ben-Ari, Ronen Talmon et al.

NEURIPS 2025arXiv:2505.21524

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025
15
citations

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices

Min Yang, Zihan Jia, Zhilin Dai et al.

ICCV 2025arXiv:2508.07312

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

David Robinson, Marius Miron, Masato Hagiwara et al.

ICLR 2025arXiv:2411.07186
23
citations

Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

Yanghao Wang, Long Chen

NEURIPS 2025arXiv:2508.11330
2
citations

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.

ICCV 2025arXiv:2503.21851
3
citations

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun et al.

NEURIPS 2025oralarXiv:2504.13181
129
citations

ProbMED: A Probabilistic Framework for Medical Multimodal Binding

Yuan Gao, Sangwook Kim, Jianzhong You et al.

ICCV 2025arXiv:2509.25711

Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection

Gensheng Pei, Tao Chen, Yujia Wang et al.

CVPR 2025arXiv:2503.17080
5
citations

Self-Evolving Visual Concept Library using Vision-Language Critics

Atharva Sehgal, Patrick Yuan, Ziniu Hu et al.

CVPR 2025arXiv:2504.00185
2
citations

Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency

Kai Gan, Bo Ye, Min-Ling Zhang et al.

ICLR 2025
3
citations

Test-Time Multimodal Backdoor Detection by Contrastive Prompting

Yuwei Niu, Shuo He, Qi Wei et al.

ICML 2025arXiv:2405.15269
2
citations

Training-Free Test-Time Adaptation via Shape and Style Guidance for Vision-Language Models

Shenglong Zhou, Manjiang Yin, Leiyu Sun et al.

NEURIPS 2025

Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs

Yunqi Hong, Sohyun An, Andrew Bai et al.

NEURIPS 2025arXiv:2506.03195
1
citations

VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

Shufan Shen, Junshu Sun, Qingming Huang et al.

NEURIPS 2025arXiv:2510.21323
1
citations

Adversarial Robustification via Text-to-Image Diffusion Models

Daewon Choi, Jongheon Jeong, Huiwon Jang et al.

ECCV 2024arXiv:2407.18658
2
citations

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman

ICML 2024arXiv:2310.05862
18
citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024arXiv:2311.16445
11
citations

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732
86
citations

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

Chenglin Yang, Siyuan Qiao, Yuan Cao et al.

ECCV 2024arXiv:2311.17072
3
citations

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024arXiv:2401.02460
65
citations

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.

CVPR 2024arXiv:2311.17049
90
citations

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.

ICML 2024arXiv:2405.00740
39
citations

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.

CVPR 2024arXiv:2403.02626
20
citations

Multi-Label Cluster Discrimination for Visual Representation Learning

Xiang An, Kaicheng Yang, Xiangzi Dai et al.

ECCV 2024arXiv:2407.17331
14
citations

Multi-modal Relation Distillation for Unified 3D Representation Learning

Huiqun Wang, Yiping Bao, Panwang Pan et al.

ECCV 2024arXiv:2407.14007
4
citations

Online Zero-Shot Classification with CLIP

Qi Qian, JUHUA HU

ECCV 2024arXiv:2408.13320
22
citations

OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport

Liangliang Shi, Jack Fan, Junchi Yan

ICML 2024

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Christian Schlarmann, Naman Singh, Francesco Croce et al.

ICML 2024arXiv:2402.12336
88
citations

SILC: Improving Vision Language Pretraining with Self-Distillation

Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.

ECCV 2024arXiv:2310.13355
59
citations

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024arXiv:2401.02137
7
citations

Transductive Zero-Shot and Few-Shot CLIP

Ségolène Martin, Yunshi HUANG, Fereshteh Shakeri et al.

CVPR 2024highlightarXiv:2405.18437
32
citations

Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

che liu, Zhongwei Wan, Cheng Ouyang et al.

ICML 2024arXiv:2403.06659
61
citations