"multimodal fusion" Papers

41 papers found

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025arXiv:2412.00833
17
citations

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan, Feng Wu, Yihang Chen et al.

NEURIPS 2025arXiv:2510.20736

A Multimodal BiMamba Network with Test-Time Adaptation for Emotion Recognition Based on Physiological Signals

Ziyu Jia, Tingyu Du, Zhengyu Tian et al.

NEURIPS 2025

A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter

Zirun Guo, Xize Cheng, Yangyang Wu et al.

AAAI 2025paperarXiv:2412.08979
3
citations

Can We Talk Models Into Seeing the World Differently?

Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.

ICLR 2025arXiv:2403.09193
17
citations

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025arXiv:2405.17537
18
citations

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

Shoubin Yu, Jaehong Yoon, Mohit Bansal

ICLR 2025arXiv:2402.05889
17
citations

CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning

Ronghao Lin, Qiaolin He, Sijie Mai et al.

NEURIPS 2025arXiv:2602.04920

Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content

Sheng Wu, Dongxiao He, Xiaobao Wang et al.

AAAI 2025paperarXiv:2412.10460
29
citations

Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next

Zhulin Hu, Yan Ma, Jiadi Su et al.

ICLR 2025

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Zijian Gu, Jianwei Ma, Yan Huang et al.

AAAI 2025paperarXiv:2412.11489
16
citations

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

Huangbiao Xu, Xiao Ke, Huanqi Wu et al.

CVPR 2025
6
citations

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel et al.

ICLR 2025arXiv:2312.08558
3
citations

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.

NEURIPS 2025arXiv:2510.24919

🎧MOSPA: Human Motion Generation Driven by Spatial Audio

Shuyang Xu, Zhiyang Dou, Mingyi Shi et al.

NEURIPS 2025spotlightarXiv:2507.11949
4
citations

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing

Langyu Wang, Langyu Wang, Yingying Chen et al.

ICCV 2025arXiv:2507.01384
1
citations

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

ICLR 2025arXiv:2410.22489
23
citations

Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik

ICLR 2025arXiv:2405.19950
2
citations

Multimodal LiDAR-Camera Novel View Synthesis with Unified Pose-free Neural Fields

Weiyi Xue, Fan Lu, Yunwei Zhu et al.

NEURIPS 2025

PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction

Manahil Raza, Ayesha Azam, Talha Qaiser et al.

ICCV 2025arXiv:2509.20022
1
citations

Reading Recognition in the Wild

Charig Yang, Samiul Alam, Shakhrul Iman Siam et al.

NEURIPS 2025arXiv:2505.24848
3
citations

Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective

Kaifang Long, Guoyang Xie, Lianbo Ma et al.

AAAI 2025paperarXiv:2412.17297
10
citations

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

Yuji Wang, Haoran Xu, Yong Liu et al.

CVPR 2025arXiv:2506.01558
9
citations

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2025arXiv:2507.17083
6
citations

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383
43
citations

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025arXiv:2506.11436
3
citations

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge

Linshen Liu, Boyan Su, Junyue Jiang et al.

ICCV 2025arXiv:2507.04123
1
citations

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abduljalil Radman, Jorma Laaksonen

CVPR 2025
6
citations

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Zecheng Li, Wengang Zhou, Weichao Zhao et al.

ICLR 2025arXiv:2501.15187
39
citations

VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations

Qianqian Qiao, DanDan Zheng, Yihang Bo et al.

NEURIPS 2025oralarXiv:2510.25238
1
citations

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tao Tang, Guangrun Wang, Yixing Lao et al.

CVPR 2024highlightarXiv:2402.17483
20
citations

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia, Can Xie, Liqiang Jing

AAAI 2024paperarXiv:2312.10493
43
citations

Event-Adapted Video Super-Resolution

Zeyu Xiao, Dachun Kai, Yueyi Zhang et al.

ECCV 2024
14
citations

Exploiting Polarized Material Cues for Robust Car Detection

Wen Dong, Haiyang Mei, Ziqi Wei et al.

AAAI 2024paperarXiv:2401.02606
7
citations

Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector

An Lao, Qi Zhang, Chongyang Shi et al.

AAAI 2024paperarXiv:2312.11023
41
citations

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

Ding Jia, Jianyuan Guo, Kai Han et al.

ICML 2024arXiv:2406.01210
51
citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024arXiv:2401.14405
12
citations

Multimodal Prototyping for cancer survival prediction

Andrew Song, Richard Chen, Guillaume Jaume et al.

ICML 2024arXiv:2407.00224
40
citations

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024arXiv:2308.16911
295
citations

Predictive Dynamic Fusion

Bing Cao, Yinan Xia, Yi Ding et al.

ICML 2024arXiv:2406.04802
27
citations

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Dingkang Yang, Mingcheng Li, Dongling Xiao et al.

ECCV 2024arXiv:2403.05023
35
citations