Poster "modality alignment" Papers

19 papers found

3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling

Qizhi Pei, Rui Yan, Kaiyuan Gao et al.

ICLR 2025arXiv:2406.05797
6
citations

An Effective Levelling Paradigm for Unlabeled Scenarios

Fangming Cui, Zhou Yu, Di Yang et al.

NEURIPS 2025

BRACE: A Benchmark for Robust Audio Caption Quality Evaluation

Tianyu Guo, Hongyu Chen, Hao Liang et al.

NEURIPS 2025arXiv:2512.10403
3
citations

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025arXiv:2412.11959
33
citations

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Chenxin Tao, Shiqian Su, Xizhou Zhu et al.

CVPR 2025arXiv:2412.16158
5
citations

Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval

Yue Wu, Zhaobo Qi, Yiling Wu et al.

ICLR 2025
7
citations

Learning Source-Free Domain Adaptation for Visible-Infrared Person Re-Identification

Yongxiang Li, Yanglin Feng, Yuan Sun et al.

NEURIPS 2025

Multi-modal Learning: A Look Back and the Road Ahead

Divyam Madaan, Sumit Chopra, Kyunghyun Cho

ICLR 2025

Multimodal Tabular Reasoning with Privileged Structured Information

Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.

NEURIPS 2025arXiv:2506.04088
9
citations

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Fei Peng, Junqiang Wu, Yan Li et al.

ICCV 2025arXiv:2508.14440
2
citations

Object-Shot Enhanced Grounding Network for Egocentric Video

Yisen Feng, Haoyu Zhang, Meng Liu et al.

CVPR 2025arXiv:2505.04270
7
citations

One Filters All: A Generalist Filter For State Estimation

Shiqi Liu, Wenhan Cao, Chang Liu et al.

NEURIPS 2025arXiv:2509.20051
2
citations

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025arXiv:2504.12717
12
citations

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

Chris Dongjoo Kim, Jihwan Moon, Sangwoo Moon et al.

CVPR 2025arXiv:2504.14875
1
citations

Vocabulary-Guided Gait Recognition

Panjian Huang, Saihui Hou, Chunshui Cao et al.

NEURIPS 2025

CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing

Faegheh Sardari, Armin Mustafa, Philip JB Jackson et al.

ECCV 2024arXiv:2405.10690
11
citations

Conceptual Codebook Learning for Vision-Language Models

Yi Zhang, Ke Yu, Siqi Wu et al.

ECCV 2024arXiv:2407.02350
7
citations

Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification

Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.

CVPR 2024arXiv:2310.08255
48
citations

Object-Oriented Anchoring and Modal Alignment in Multimodal Learning

Shibin Mei, Bingbing Ni, Hang Wang et al.

ECCV 2024
1
citations