Poster "vision-language pre-training" Papers

15 papers found

Language-Image Models with 3D Understanding

Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.

ICLR 2025arXiv:2405.03685
27
citations

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching

Meng Tian, Shuo Yang, Xinxiao Wu

ICCV 2025arXiv:2506.23502
1
citations

REOBench: Benchmarking Robustness of Earth Observation Foundation Models

Xiang Li, Yong Tao, Siyuan Zhang et al.

NEURIPS 2025arXiv:2505.16793
3
citations

Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency

Kai Gan, Bo Ye, Min-Ling Zhang et al.

ICLR 2025
3
citations

Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment

Yankai Jiang, Wenhui Lei, Xiaofan Zhang et al.

ICLR 2025arXiv:2410.15744
6
citations

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Jiawang Bai, Kuofeng Gao, Shaobo Min et al.

CVPR 2024arXiv:2311.16194
68
citations

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

Sensen Gao, Xiaojun Jia, Xuhong Ren et al.

ECCV 2024arXiv:2403.12445
34
citations

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Linlan Huang, Xusheng Cao, Haori Lu et al.

ECCV 2024arXiv:2407.14143
41
citations

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.

ECCV 2024arXiv:2407.09781
11
citations

Efficient Vision-Language Pre-training by Cluster Masking

Zihao Wei, Zixuan Pan, Andrew Owens

CVPR 2024arXiv:2405.08815
15
citations

Integration of Global and Local Representations for Fine-grained Cross-modal Alignment

Seungwan Jin, Hoyoung Choi, Taehyung Noh et al.

ECCV 2024
1
citations

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Chen Duan, Pei Fu, Shan Guo et al.

CVPR 2024arXiv:2403.00303
16
citations

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Shouwei Ruan, Yinpeng Dong, Liu Hanqing et al.

ECCV 2024arXiv:2404.12139
4
citations

Online Zero-Shot Classification with CLIP

Qi Qian, JUHUA HU

ECCV 2024arXiv:2408.13320
22
citations

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

Xiaoxuan He, Yifan Yang, Xinyang Jiang et al.

ECCV 2024arXiv:2311.14851
5
citations