Poster "multi-modal fusion" Papers

14 papers found

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

He Zhu, Quyu Kong, Kechun Xu et al.

CVPR 2025arXiv:2504.04744
7
citations

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

Ruiyang Ha, Songyi Jiang, Bin Li et al.

ICCV 2025arXiv:2503.17096
2
citations

OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels

Dongsheng Yuan, Xie Zhang, Weiying Hou et al.

NEURIPS 2025

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Yanlong Chen, Mattia Orlandi, Pierangelo Rapa et al.

NEURIPS 2025arXiv:2506.10351
2
citations

RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network

Van-Tin Luu, Yong-Lin Cai, Vu-Hoang Tran et al.

CVPR 2025arXiv:2505.22427
3
citations

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Siyu Chen, Ting Han, Changshe Zhang et al.

ICCV 2025arXiv:2504.12753
2
citations

Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

jusheng zhang, Yijia Fan, Zimo Wen et al.

NEURIPS 2025

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Junxiong Lin, Yan Wang, Zeng Tao et al.

ECCV 2024arXiv:2403.05808
5
citations

Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation

Xu Zheng, Yuanhuiyi Lyu, jiazhou zhou et al.

ECCV 2024arXiv:2407.11344
19
citations

ReMamber: Referring Image Segmentation with Mamba Twister

Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.

ECCV 2024arXiv:2403.17839
50
citations

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.

CVPR 2024arXiv:2407.09751
14
citations

TUMTraf V2X Cooperative Perception Dataset

Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.

CVPR 2024arXiv:2403.01316
89
citations

UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Jian Zou, Tianyu Huang, Guanglei Yang et al.

ECCV 2024
17
citations

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024arXiv:2407.12345
22
citations