"multi-modal fusion" Papers

23 papers found

DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction

Junjie Zhou, Shouju Wang, Yuxia Tang et al.

CVPR 2025highlightarXiv:2503.09491
1
citations

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

He Zhu, Quyu Kong, Kechun Xu et al.

CVPR 2025arXiv:2504.04744
7
citations

JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts

Taein Son, Soo Won Seo, Jisong Kim et al.

AAAI 2025paperarXiv:2412.13708
2
citations

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

Ruiyang Ha, Songyi Jiang, Bin Li et al.

ICCV 2025arXiv:2503.17096
2
citations

OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels

Dongsheng Yuan, Xie Zhang, Weiying Hou et al.

NEURIPS 2025

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Yanlong Chen, Mattia Orlandi, Pierangelo Rapa et al.

NEURIPS 2025arXiv:2506.10351
2
citations

RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network

Van-Tin Luu, Yong-Lin Cai, Vu-Hoang Tran et al.

CVPR 2025arXiv:2505.22427
3
citations

RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection

Yiheng Li, Yang Yang, Zhen Lei

AAAI 2025paperarXiv:2412.12799
6
citations

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Siyu Chen, Ting Han, Changshe Zhang et al.

ICCV 2025arXiv:2504.12753
2
citations

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
21
citations

Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

jusheng zhang, Yijia Fan, Zimo Wen et al.

NEURIPS 2025

Unbiased Prototype Consistency Learning for Multi-Modal and Multi-Task Object Re-Identification

Zhongao Zhou, Bin Yang, Wenke Huang et al.

NEURIPS 2025spotlight

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Junxiong Lin, Yan Wang, Zeng Tao et al.

ECCV 2024arXiv:2403.05808
5
citations

A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation

Yongkang Wang, Xuan Liu, Feng Huang et al.

AAAI 2024paperarXiv:2312.15665
27
citations

Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation

Yujun Chen, Xin Tan, Zhizhong Zhang et al.

AAAI 2024paperarXiv:2312.08234
6
citations

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Wentao Mo, Yang Liu

AAAI 2024paperarXiv:2402.15933
26
citations

Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation

Xu Zheng, Yuanhuiyi Lyu, jiazhou zhou et al.

ECCV 2024arXiv:2407.11344
19
citations

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Wenfang Yao, Kejing Yin, William Cheung et al.

AAAI 2024paperarXiv:2403.06197
56
citations

ReMamber: Referring Image Segmentation with Mamba Twister

Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.

ECCV 2024arXiv:2403.17839
50
citations

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.

CVPR 2024arXiv:2407.09751
14
citations

TUMTraf V2X Cooperative Perception Dataset

Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.

CVPR 2024arXiv:2403.01316
89
citations

UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Jian Zou, Tianyu Huang, Guanglei Yang et al.

ECCV 2024
17
citations

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024arXiv:2407.12345
22
citations