"contrastive language-image pre-training" Papers
22 papers found
Conference
Advancing Interpretability of CLIP Representations with Concept Surrogate Model
Nhat Hoang-Xuan, Xiyuan Wei, Wanli Xing et al.
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi, Ali Nazari, Aminreza Sefid et al.
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan, Kebin wu, Fatima Albreiki
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
Shijie Ma, Yuying Ge, Teng Wang et al.
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi, Minjing Dong, Chang Xu
Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning
Linlan Huang, Xusheng Cao, Haori Lu et al.
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami, Gerard de Melo
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu, Yanhao Wu, Wei Ke et al.
Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
Zhongxing Xu, Feilong Tang, Zhe Chen et al.
un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
Yinqi Li, Jiahe Zhao, Hong Chang et al.
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang, Meng Wang, Huazhu Fu et al.
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu et al.
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data
Zhengfeng Lai, Joohi Chauhan, Brittany N. Dugger et al.
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang, Zhulin An, Libo Huang et al.
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
Gradient-based Visual Explanation for Transformer-based CLIP
Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.
Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning
Zhuo Huang, Chang Liu, Yinpeng Dong et al.
Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
Qu Yang, Mang Ye, Dacheng Tao
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
Weakly Supervised Semantic Segmentation for Driving Scenes
Dongseob Kim, Seungho Lee, Junsuk Choe et al.