"multimodal representation learning" Papers
29 papers found
Conference
Aligning Multimodal Representations through an Information Bottleneck
Antonio Almudévar, Jose Miguel Hernandez-Lobato, Sameer Khurana et al.
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
Wenji Fang, Shang Liu, Jing Wang et al.
Context-Aware Multimodal Pretraining
Karsten Roth, Zeynep Akata, Dima Damen et al.
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
Pan Wang, Qiang Zhou, Yawen Wu et al.
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
Souhail Hadgi, Luca Moschella, Andrea Santilli et al.
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
Chengzhi Liu, Zile Huang, Zhe Chen et al.
Integrating Sequence and Image Modeling in Irregular Medical Time Series Through Self-Supervised Learning
Liuqing Chen, Shuhong Xiao, Shixian Ding et al.
Learning Shared Representations from Unpaired Data
Amitai Yacobi, Nir Ben-Ari, Ronen Talmon et al.
Multimodal Variational Autoencoder: A Barycentric View
Peijie Qiu, Wenhui Zhu, Sayantan Kumar et al.
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
Yichao Cai, Yuhang Liu, Erdun Gao et al.
Scaling Language-Free Visual Representation Learning
David Fan, Shengbang Tong, Jiachen Zhu et al.
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP
Yayuan Li, Jintao Guo, Lei Qi et al.
The "Law'' of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities
Yongwei Che, Benjamin Eysenbach
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi, Eddy Ilg, Margret Keuper et al.
TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence
Feng Jiang, Mangal Prakash, Hehuan Ma et al.
Understanding Co-speech Gestures in-the-wild
Sindhu Hegde, K R Prajwal, Taein Kwon et al.
Understanding the Gain from Data Filtering in Multimodal Contrastive Learning
Divyansh Pareek, Sewoong Oh, Simon Du
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
Andy V Huynh, Lauren Gillespie, Jael Lopez-Saucedo et al.
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
Yunpeng Bai, Xintao Wang, Yanpei Cao et al.
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova, Anna Kukleva, Xudong Hong et al.
Learning Multimodal Latent Generative Models with Energy-Based Prior
Shiyu Yuan, Jiali Cui, Hanao Li et al.
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang, Qi Zhang, Zixuan Gong et al.
Multimodal Patient Representation Learning with Missing Modalities and Labels
Zhenbang Wu, Anant Dadu, Nicholas Tustison et al.
Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
Hao Dong, Eleni Chatzi, Olga Fink
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He, Yifan Yang, Xinyang Jiang et al.