"multimodal pre-training" Papers
8 papers found
Conference
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
Qizhi Pei, Rui Yan, Kaiyuan Gao et al.
ICLR 2025arXiv:2406.05797
6
citations
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Wenzhuo Liu, Fei Zhu, Longhui Wei et al.
ICLR 2025
13
citations
CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning
Eloy Geenjaar, Lie Lu
AAAI 2025paperarXiv:2412.11695
1
citations
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
CVPR 2025arXiv:2504.16030
4
citations
Should VLMs be Pre-trained with Image Data?
Sedrick Keh, Jean Mercat, Samir Yitzhak Gadre et al.
ICLR 2025arXiv:2503.07603
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su, Xiulong Liu, Eli Shlizerman
ICML 2024arXiv:2409.19132
17
citations
Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
Shibin Mei, Bingbing Ni, Hang Wang et al.
ECCV 2024
1
citations
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
Xiao Wang, Wentao Wu, Chenglong Li et al.
AAAI 2024paperarXiv:2312.09812
7
citations