"multimodal representation learning" Papers

29 papers found

Aligning Multimodal Representations through an Information Bottleneck

Antonio Almudévar, Jose Miguel Hernandez-Lobato, Sameer Khurana et al.

ICML 2025arXiv:2506.04870
6
citations

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025arXiv:2501.16629
21
citations

CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design

Wenji Fang, Shang Liu, Jing Wang et al.

ICLR 2025arXiv:2505.02168
14
citations

Context-Aware Multimodal Pretraining

Karsten Roth, Zeynep Akata, Dima Damen et al.

CVPR 2025highlightarXiv:2411.15099
4
citations

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Pan Wang, Qiang Zhou, Yawen Wu et al.

AAAI 2025paperarXiv:2412.12225
47
citations

Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces

Souhail Hadgi, Luca Moschella, Andrea Santilli et al.

CVPR 2025arXiv:2503.05283
2
citations

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025arXiv:2412.11959
33
citations

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis

Chengzhi Liu, Zile Huang, Zhe Chen et al.

AAAI 2025paperarXiv:2502.11724
8
citations

Integrating Sequence and Image Modeling in Irregular Medical Time Series Through Self-Supervised Learning

Liuqing Chen, Shuhong Xiao, Shixian Ding et al.

AAAI 2025paperarXiv:2502.06134
1
citations

Learning Shared Representations from Unpaired Data

Amitai Yacobi, Nir Ben-Ari, Ronen Talmon et al.

NEURIPS 2025arXiv:2505.21524

Multimodal Variational Autoencoder: A Barycentric View

Peijie Qiu, Wenhui Zhu, Sayantan Kumar et al.

AAAI 2025paperarXiv:2412.20487
5
citations

Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition

Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian

CVPR 2025arXiv:2502.10674
1
citations

On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Yichao Cai, Yuhang Liu, Erdun Gao et al.

NEURIPS 2025spotlightarXiv:2504.10143
7
citations

Scaling Language-Free Visual Representation Learning

David Fan, Shengbang Tong, Jiachen Zhu et al.

ICCV 2025highlightarXiv:2504.01017
41
citations

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

Yayuan Li, Jintao Guo, Lei Qi et al.

AAAI 2025paperarXiv:2412.11375
5
citations

The "Law'' of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities

Yongwei Che, Benjamin Eysenbach

ICLR 2025arXiv:2501.11326
1
citations

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Jonas Belouadi, Eddy Ilg, Margret Keuper et al.

ICCV 2025highlightarXiv:2503.11509
5
citations

TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence

Feng Jiang, Mangal Prakash, Hehuan Ma et al.

NEURIPS 2025spotlightarXiv:2506.21028
2
citations

Understanding Co-speech Gestures in-the-wild

Sindhu Hegde, K R Prajwal, Taein Kwon et al.

ICCV 2025arXiv:2503.22668
2
citations

Understanding the Gain from Data Filtering in Multimodal Contrastive Learning

Divyansh Pareek, Sewoong Oh, Simon Du

NEURIPS 2025arXiv:2512.14230

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman

ICML 2024arXiv:2310.05862
18
citations

Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery

Andy V Huynh, Lauren Gillespie, Jael Lopez-Saucedo et al.

ECCV 2024arXiv:2409.19439
12
citations

DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment

Yunpeng Bai, Xintao Wang, Yanpei Cao et al.

ECCV 2024
12
citations

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Nina Shvetsova, Anna Kukleva, Xudong Hong et al.

ECCV 2024arXiv:2310.04900
33
citations

Learning Multimodal Latent Generative Models with Energy-Based Prior

Shiyu Yuan, Jiali Cui, Hanao Li et al.

ECCV 2024arXiv:2409.19862
4
citations

MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

Yu Zhang, Qi Zhang, Zixuan Gong et al.

ICML 2024arXiv:2406.01460
7
citations

Multimodal Patient Representation Learning with Missing Modalities and Labels

Zhenbang Wu, Anant Dadu, Nicholas Tustison et al.

ICLR 2024
29
citations

Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision

Hao Dong, Eleni Chatzi, Olga Fink

ECCV 2024arXiv:2407.01518
17
citations

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

Xiaoxuan He, Yifan Yang, Xinyang Jiang et al.

ECCV 2024arXiv:2311.14851
5
citations