"multimodal fusion" Papers

41 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025arXiv:2412.00833

citations

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan, Feng Wu, Yihang Chen et al.

NEURIPS 2025arXiv:2510.20736

A Multimodal BiMamba Network with Test-Time Adaptation for Emotion Recognition Based on Physiological Signals

Ziyu Jia, Tingyu Du, Zhengyu Tian et al.

NEURIPS 2025

A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter

Zirun Guo, Xize Cheng, Yangyang Wu et al.

AAAI 2025paperarXiv:2412.08979

citations

Can We Talk Models Into Seeing the World Differently?

Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.

ICLR 2025arXiv:2403.09193

citations

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025arXiv:2405.17537

citations

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

Shoubin Yu, Jaehong Yoon, Mohit Bansal

ICLR 2025arXiv:2402.05889

citations

CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning

Ronghao Lin, Qiaolin He, Sijie Mai et al.

NEURIPS 2025arXiv:2602.04920

Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content

Sheng Wu, Dongxiao He, Xiaobao Wang et al.

AAAI 2025paperarXiv:2412.10460

citations

Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next

Zhulin Hu, Yan Ma, Jiadi Su et al.

ICLR 2025

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Zijian Gu, Jianwei Ma, Yan Huang et al.

AAAI 2025paperarXiv:2412.11489

citations

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

Huangbiao Xu, Xiao Ke, Huanqi Wu et al.

CVPR 2025

citations

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel et al.

ICLR 2025arXiv:2312.08558

citations

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.

NEURIPS 2025arXiv:2510.24919

🎧MOSPA: Human Motion Generation Driven by Spatial Audio

Shuyang Xu, Zhiyang Dou, Mingyi Shi et al.

NEURIPS 2025spotlightarXiv:2507.11949

citations

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing

Langyu Wang, Langyu Wang, Yingying Chen et al.

ICCV 2025arXiv:2507.01384

citations

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

ICLR 2025arXiv:2410.22489

citations

Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik

ICLR 2025arXiv:2405.19950

citations

Multimodal LiDAR-Camera Novel View Synthesis with Unified Pose-free Neural Fields

Weiyi Xue, Fan Lu, Yunwei Zhu et al.

NEURIPS 2025

PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction

Manahil Raza, Ayesha Azam, Talha Qaiser et al.

ICCV 2025arXiv:2509.20022

citations

Reading Recognition in the Wild

Charig Yang, Samiul Alam, Shakhrul Iman Siam et al.

NEURIPS 2025arXiv:2505.24848

citations

Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective

Kaifang Long, Guoyang Xie, Lianbo Ma et al.

AAAI 2025paperarXiv:2412.17297

citations

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

Yuji Wang, Haoran Xu, Yong Liu et al.

CVPR 2025arXiv:2506.01558

citations

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2025arXiv:2507.17083

citations

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383

citations

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025arXiv:2506.11436

citations

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge

Linshen Liu, Boyan Su, Junyue Jiang et al.

ICCV 2025arXiv:2507.04123

citations

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abduljalil Radman, Jorma Laaksonen

CVPR 2025

citations

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Zecheng Li, Wengang Zhou, Weichao Zhao et al.

ICLR 2025arXiv:2501.15187

citations

VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations

Qianqian Qiao, DanDan Zheng, Yihang Bo et al.

NEURIPS 2025oralarXiv:2510.25238

citations

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tao Tang, Guangrun Wang, Yixing Lao et al.

CVPR 2024highlightarXiv:2402.17483

citations

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia, Can Xie, Liqiang Jing

AAAI 2024paperarXiv:2312.10493

citations

Event-Adapted Video Super-Resolution

Zeyu Xiao, Dachun Kai, Yueyi Zhang et al.

ECCV 2024

citations

Exploiting Polarized Material Cues for Robust Car Detection

Wen Dong, Haiyang Mei, Ziqi Wei et al.

AAAI 2024paperarXiv:2401.02606

citations

Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector

An Lao, Qi Zhang, Chongyang Shi et al.

AAAI 2024paperarXiv:2312.11023

citations

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

Ding Jia, Jianyuan Guo, Kai Han et al.

ICML 2024arXiv:2406.01210

citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024arXiv:2401.14405

citations

Multimodal Prototyping for cancer survival prediction

Andrew Song, Richard Chen, Guillaume Jaume et al.

ICML 2024arXiv:2407.00224

citations

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024arXiv:2308.16911

295

citations

Predictive Dynamic Fusion

Bing Cao, Yinan Xia, Yi Ding et al.

ICML 2024arXiv:2406.04802

citations

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Dingkang Yang, Mingcheng Li, Dongling Xiao et al.

ECCV 2024arXiv:2403.05023

citations