"modality gap" Papers

21 papers found

Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation

Jianyuan Guo, Peike Li, Trevor Cohn

NEURIPS 2025oralarXiv:2505.15438
3
citations

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

CVPR 2025arXiv:2503.18817
4
citations

Global Minimizers of Sigmoid Contrastive Loss

Kiril Bangachev, Guy Bresler, Iliyas Noman et al.

NEURIPS 2025arXiv:2509.18552

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Pengfei Zhao, Rongbo Luan, Wei Zhang et al.

NEURIPS 2025arXiv:2506.06970
1
citations

Learning Visual Proxy for Compositional Zero-Shot Learning

Shiyu Zhang, Cheng Yan, Yang Liu et al.

ICCV 2025arXiv:2501.13859

Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

Linlan Huang, Xusheng Cao, Haori Lu et al.

ICCV 2025highlightarXiv:2507.09118

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025
15
citations

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025arXiv:2504.12717
12
citations

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

Jian Xiao, Zijie Song, Jialong Hu et al.

NEURIPS 2025arXiv:2505.12499

Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer

Ningyuan Zhang, Jie Lu, Keqiuyin Li et al.

ICLR 2025
1
citations

Superpowering Open-Vocabulary Object Detectors for X-ray Vision

Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.

ICCV 2025arXiv:2503.17071

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Haobin Li, Peng Hu, Qianjun Zhang et al.

ICLR 2025arXiv:2410.15624
9
citations

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP

Zhongxing Xu, Feilong Tang, Zhe Chen et al.

AAAI 2025paperarXiv:2412.19650

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Zixun Fang, Kai Zhu, Zhiheng Liu et al.

NEURIPS 2025arXiv:2506.23513

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.

AAAI 2024paperarXiv:2401.10588
45
citations

Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning

Zhiyue Liu, Jinyuan Liu, Fanrong Ma

AAAI 2024paperarXiv:2312.08865
20
citations

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

ECCV 2024arXiv:2403.10153
23
citations

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Yicheng Liu, Jie Wen, Chengliang Liu et al.

ICML 2024

Learning Modality Knowledge Alignment for Cross-Modality Transfer

Wenxuan Ma, Shuang Li, Lincan Cai et al.

ICML 2024arXiv:2406.18864
8
citations

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

Haimei Zhao, Qiming Zhang, Shanshan Zhao et al.

AAAI 2024paperarXiv:2303.16818
25
citations

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024arXiv:2403.06946
19
citations