"multi-modal fusion" Papers
23 papers found
Conference
DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction
Junjie Zhou, Shouju Wang, Yuxia Tang et al.
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu, Quyu Kong, Kechun Xu et al.
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son, Soo Won Seo, Jisong Kim et al.
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Ruiyang Ha, Songyi Jiang, Bin Li et al.
OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels
Dongsheng Yuan, Xie Zhang, Weiying Hou et al.
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Yanlong Chen, Mattia Orlandi, Pierangelo Rapa et al.
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
Van-Tin Luu, Yong-Lin Cai, Vu-Hoang Tran et al.
RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Yiheng Li, Yang Yang, Zhen Lei
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen, Ting Han, Changshe Zhang et al.
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation
jusheng zhang, Yijia Fan, Zimo Wen et al.
Unbiased Prototype Consistency Learning for Multi-Modal and Multi-Task Object Re-Identification
Zhongao Zhou, Bin Yang, Wenke Huang et al.
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
Junxiong Lin, Yan Wang, Zeng Tao et al.
A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation
Yongkang Wang, Xuan Liu, Feng Huang et al.
Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation
Yujun Chen, Xin Tan, Zhizhong Zhang et al.
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
Wentao Mo, Yang Liu
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xu Zheng, Yuanhuiyi Lyu, jiazhou zhou et al.
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Wenfang Yao, Kejing Yin, William Cheung et al.
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.
UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
Jian Zou, Tianyu Huang, Guanglei Yang et al.
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.