Poster "modality alignment" Papers
19 papers found
Conference
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
Qizhi Pei, Rui Yan, Kaiyuan Gao et al.
An Effective Levelling Paradigm for Unlabeled Scenarios
Fangming Cui, Zhou Yu, Di Yang et al.
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
Tianyu Guo, Hongyu Chen, Hao Liang et al.
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao, Shiqian Su, Xizhou Zhu et al.
Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval
Yue Wu, Zhaobo Qi, Yiling Wu et al.
Learning Source-Free Domain Adaptation for Visible-Infrared Person Re-Identification
Yongxiang Li, Yanglin Feng, Yuan Sun et al.
Multi-modal Learning: A Look Back and the Road Ahead
Divyam Madaan, Sumit Chopra, Kyunghyun Cho
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng, Junqiang Wu, Yan Li et al.
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu et al.
One Filters All: A Generalist Filter For State Estimation
Shiqi Liu, Wenhan Cao, Chang Liu et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
Chris Dongjoo Kim, Jihwan Moon, Sangwoo Moon et al.
Vocabulary-Guided Gait Recognition
Panjian Huang, Saihui Hou, Chunshui Cao et al.
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
Faegheh Sardari, Armin Mustafa, Philip JB Jackson et al.
Conceptual Codebook Learning for Vision-Language Models
Yi Zhang, Ke Yu, Siqi Wu et al.
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.
Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
Shibin Mei, Bingbing Ni, Hang Wang et al.