Poster "multi-modal fusion" Papers
14 papers found
Conference
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu, Quyu Kong, Kechun Xu et al.
CVPR 2025arXiv:2504.04744
7
citations
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Ruiyang Ha, Songyi Jiang, Bin Li et al.
ICCV 2025arXiv:2503.17096
2
citations
OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels
Dongsheng Yuan, Xie Zhang, Weiying Hou et al.
NEURIPS 2025
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Yanlong Chen, Mattia Orlandi, Pierangelo Rapa et al.
NEURIPS 2025arXiv:2506.10351
2
citations
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
Van-Tin Luu, Yong-Lin Cai, Vu-Hoang Tran et al.
CVPR 2025arXiv:2505.22427
3
citations
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen, Ting Han, Changshe Zhang et al.
ICCV 2025arXiv:2504.12753
2
citations
Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation
jusheng zhang, Yijia Fan, Zimo Wen et al.
NEURIPS 2025
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
Junxiong Lin, Yan Wang, Zeng Tao et al.
ECCV 2024arXiv:2403.05808
5
citations
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xu Zheng, Yuanhuiyi Lyu, jiazhou zhou et al.
ECCV 2024arXiv:2407.11344
19
citations
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.
ECCV 2024arXiv:2403.17839
50
citations
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.
CVPR 2024arXiv:2407.09751
14
citations
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.
CVPR 2024arXiv:2403.01316
89
citations
UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
Jian Zou, Tianyu Huang, Guanglei Yang et al.
ECCV 2024
17
citations
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.
ECCV 2024arXiv:2407.12345
22
citations