Poster "vision-language pre-training" Papers
15 papers found
Conference
Language-Image Models with 3D Understanding
Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.
ICLR 2025arXiv:2405.03685
27
citations
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
Meng Tian, Shuo Yang, Xinxiao Wu
ICCV 2025arXiv:2506.23502
1
citations
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
Xiang Li, Yong Tao, Siyuan Zhang et al.
NEURIPS 2025arXiv:2505.16793
3
citations
Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency
Kai Gan, Bo Ye, Min-Ling Zhang et al.
ICLR 2025
3
citations
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Yankai Jiang, Wenhui Lei, Xiaofan Zhang et al.
ICLR 2025arXiv:2410.15744
6
citations
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai, Kuofeng Gao, Shaobo Min et al.
CVPR 2024arXiv:2311.16194
68
citations
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia, Xuhong Ren et al.
ECCV 2024arXiv:2403.12445
34
citations
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
Linlan Huang, Xusheng Cao, Haori Lu et al.
ECCV 2024arXiv:2407.14143
41
citations
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.
ECCV 2024arXiv:2407.09781
11
citations
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei, Zixuan Pan, Andrew Owens
CVPR 2024arXiv:2405.08815
15
citations
Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
Seungwan Jin, Hoyoung Choi, Taehyung Noh et al.
ECCV 2024
1
citations
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan, Pei Fu, Shan Guo et al.
CVPR 2024arXiv:2403.00303
16
citations
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
Shouwei Ruan, Yinpeng Dong, Liu Hanqing et al.
ECCV 2024arXiv:2404.12139
4
citations
Online Zero-Shot Classification with CLIP
Qi Qian, JUHUA HU
ECCV 2024arXiv:2408.13320
22
citations
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He, Yifan Yang, Xinyang Jiang et al.
ECCV 2024arXiv:2311.14851
5
citations