"multi-modal learning" Papers

28 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID

Xin Liang, Yogesh S. Rawat

CVPR 2025arXiv:2503.22912

citations

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Saeed Amizadeh, Sara Abdali, Yinheng Li et al.

NEURIPS 2025arXiv:2509.15448

Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation

Xiang Fang, Wanlong Fang, Changshuo Wang

NEURIPS 2025

Incomplete Multi-view Deep Clustering with Data Imputation and Alignment

Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.

NEURIPS 2025

citations

Learning Diagrams: A Graphical Language for Compositional Training Regimes

Mason Lary, Richard Samuelson, Alexander Wilentz et al.

ICLR 2025

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025arXiv:2410.17637

citations

Multi-modal Knowledge Distillation-based Human Trajectory Forecasting

Jaewoo Jeong, Seohee Lee, Daehee Park et al.

CVPR 2025arXiv:2503.22201

citations

Multi-modal Learning: A Look Back and the Road Ahead

Divyam Madaan, Sumit Chopra, Kyunghyun Cho

ICLR 2025

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Xize Cheng, Siqi Zheng, zehan wang et al.

ICLR 2025arXiv:2410.21269

citations

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning

Yunbin Tu, Liang Li, Li Su et al.

AAAI 2025paperarXiv:2412.13543

citations

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025arXiv:2503.20188

citations

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing

Yingying Zhang, Lixiang Ru, Kang Wu et al.

ICCV 2025arXiv:2507.13812

citations

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.

CVPR 2025arXiv:2503.18933

Towards Out-of-Modal Generalization without Instance-level Modal Correspondence

Zhuo Huang, Gang Niu, Bo Han et al.

ICLR 2025

citations

Understanding Contrastive Learning via Gaussian Mixture Models

Parikshit Bansal, Ali Kavis, Sujay Sanghavi

NEURIPS 2025

citations

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen et al.

AAAI 2024paperarXiv:2307.01146

citations

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268

citations

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.

ECCV 2024arXiv:2403.18274

citations

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305

citations

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024paperarXiv:2312.08212

citations

MESED: A Multi-Modal Entity Set Expansion Dataset with Fine-Grained Semantic Classes and Hard Negative Entities

Li Yangning, Tingwei Lu, Hai-Tao Zheng et al.

AAAI 2024paperarXiv:2307.14878

citations

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

HaiTao Yu, Mofei Song

AAAI 2024paperarXiv:2402.10002

citations

Mono3DVG: 3D Visual Grounding in Monocular Images

Yangfan Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024paperarXiv:2312.08022

citations

Multi-Label Supervised Contrastive Learning

Pingyue Zhang, Mengyue Wu

AAAI 2024paperarXiv:2410.13439

citations

ReconBoost: Boosting Can Achieve Modality Reconcilement

Cong Hua, Qianqian Xu, Shilong Bao et al.

ICML 2024arXiv:2405.09321

citations

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding

Sahil Santosh Khose, Anisha Pal, Aayushi Agarwal et al.

ECCV 2024arXiv:2312.06719

citations

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Shikai Qiu, Boran Han, Danielle Robinson et al.

ICML 2024arXiv:2406.07337

citations

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Yuanhong Chen, Yuyuan Liu, Hu Wang et al.

CVPR 2024arXiv:2304.02970

citations