"modality gap" Papers

21 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation

Jianyuan Guo, Peike Li, Trevor Cohn

NEURIPS 2025oralarXiv:2505.15438

citations

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

CVPR 2025arXiv:2503.18817

citations

Global Minimizers of Sigmoid Contrastive Loss

Kiril Bangachev, Guy Bresler, Iliyas Noman et al.

NEURIPS 2025arXiv:2509.18552

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Pengfei Zhao, Rongbo Luan, Wei Zhang et al.

NEURIPS 2025arXiv:2506.06970

citations

Learning Visual Proxy for Compositional Zero-Shot Learning

Shiyu Zhang, Cheng Yan, Yang Liu et al.

ICCV 2025arXiv:2501.13859

Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

Linlan Huang, Xusheng Cao, Haori Lu et al.

ICCV 2025highlightarXiv:2507.09118

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025

citations

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025arXiv:2504.12717

citations

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

Jian Xiao, Zijie Song, Jialong Hu et al.

NEURIPS 2025arXiv:2505.12499

Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer

Ningyuan Zhang, Jie Lu, Keqiuyin Li et al.

ICLR 2025

citations

Superpowering Open-Vocabulary Object Detectors for X-ray Vision

Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.

ICCV 2025arXiv:2503.17071

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Haobin Li, Peng Hu, Qianjun Zhang et al.

ICLR 2025arXiv:2410.15624

citations

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP

Zhongxing Xu, Feilong Tang, Zhe Chen et al.

AAAI 2025paperarXiv:2412.19650

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Zixun Fang, Kai Zhu, Zhiheng Liu et al.

NEURIPS 2025arXiv:2506.23513

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.

AAAI 2024paperarXiv:2401.10588

citations

Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning

Zhiyue Liu, Jinyuan Liu, Fanrong Ma

AAAI 2024paperarXiv:2312.08865

citations

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

ECCV 2024arXiv:2403.10153

citations

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Yicheng Liu, Jie Wen, Chengliang Liu et al.

ICML 2024

Learning Modality Knowledge Alignment for Cross-Modality Transfer

Wenxuan Ma, Shuang Li, Lincan Cai et al.

ICML 2024arXiv:2406.18864

citations

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

Haimei Zhao, Qiming Zhang, Shanshan Zhao et al.

AAAI 2024paperarXiv:2303.16818

citations

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024arXiv:2403.06946

citations