"multi-modal learning" Papers
28 papers found
Conference
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh, Sara Abdali, Yinheng Li et al.
Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation
Xiang Fang, Wanlong Fang, Changshuo Wang
Incomplete Multi-view Deep Clustering with Data Imputation and Alignment
Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.
Learning Diagrams: A Graphical Language for Compositional Training Regimes
Mason Lary, Richard Samuelson, Alexander Wilentz et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
Jaewoo Jeong, Seohee Lee, Daehee Park et al.
Multi-modal Learning: A Look Back and the Road Ahead
Divyam Madaan, Sumit Chopra, Kyunghyun Cho
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Xize Cheng, Siqi Zheng, zehan wang et al.
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
Yunbin Tu, Liang Li, Li Su et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu et al.
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.
Towards Out-of-Modal Generalization without Instance-level Modal Correspondence
Zhuo Huang, Gang Niu, Bo Han et al.
Understanding Contrastive Learning via Gaussian Mixture Models
Parikshit Bansal, Ali Kavis, Sujay Sanghavi
AVSegFormer: Audio-Visual Segmentation with Transformer
Shengyi Gao, Zhe Chen, Guo Chen et al.
COMMA: Co-articulated Multi-Modal Learning
Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
LAMM: Label Alignment for Multi-Modal Prompt Learning
Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.
MESED: A Multi-Modal Entity Set Expansion Dataset with Fine-Grained Semantic Classes and Hard Negative Entities
Li Yangning, Tingwei Lu, Hai-Tao Zheng et al.
MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding
HaiTao Yu, Mofei Song
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
Multi-Label Supervised Contrastive Learning
Pingyue Zhang, Mengyue Wu
ReconBoost: Boosting Can Achieve Modality Reconcilement
Cong Hua, Qianqian Xu, Shilong Bao et al.
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
Sahil Santosh Khose, Anisha Pal, Aayushi Agarwal et al.
Transferring Knowledge From Large Foundation Models to Small Downstream Models
Shikai Qiu, Boran Han, Danielle Robinson et al.
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang et al.