"multimodal fusion" Papers
41 papers found
Conference
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li, Yifei Xing, Xiangyuan Lan et al.
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Tsai Hor Chan, Feng Wu, Yihang Chen et al.
A Multimodal BiMamba Network with Test-Time Adaptation for Emotion Recognition Based on Physiological Signals
Ziyu Jia, Tingyu Du, Zhengyu Tian et al.
A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter
Zirun Guo, Xize Cheng, Yangyang Wu et al.
Can We Talk Models Into Seeing the World Differently?
Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
ZeMing Gong, Austin Wang, Xiaoliang Huo et al.
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu, Jaehong Yoon, Mohit Bansal
CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning
Ronghao Lin, Qiaolin He, Sijie Mai et al.
Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content
Sheng Wu, Dongxiao He, Xiaobao Wang et al.
Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next
Zhulin Hu, Yan Ma, Jiadi Su et al.
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
Zijian Gu, Jianwei Ma, Yan Huang et al.
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
Huangbiao Xu, Xiao Ke, Huanqi Wu et al.
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel et al.
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.
🎧MOSPA: Human Motion Generation Driven by Spatial Audio
Shuyang Xu, Zhiyang Dou, Mingyi Shi et al.
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang, Langyu Wang, Yingying Chen et al.
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu et al.
Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine
Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik
Multimodal LiDAR-Camera Novel View Synthesis with Unified Pose-free Neural Fields
Weiyi Xue, Fan Lu, Yunwei Zhu et al.
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
Manahil Raza, Ayesha Azam, Talha Qaiser et al.
Reading Recognition in the Wild
Charig Yang, Samiul Alam, Shakhrul Iman Siam et al.
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
Kaifang Long, Guoyang Xie, Lianbo Ma et al.
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang, Haoran Xu, Yong Liu et al.
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
ZaiPeng Duan, Xuzhong Hu, Pei An et al.
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
Rong Li, Shijie Li, Lingdong Kong et al.
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang et al.
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge
Linshen Liu, Boyan Su, Junyue Jiang et al.
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
Uni-Sign: Toward Unified Sign Language Understanding at Scale
Zecheng Li, Wengang Zhou, Weichao Zhao et al.
VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations
Qianqian Qiao, DanDan Zheng, Yihang Bo et al.
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tao Tang, Guangrun Wang, Yixing Lao et al.
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
Mengzhao Jia, Can Xie, Liqiang Jing
Event-Adapted Video Super-Resolution
Zeyu Xiao, Dachun Kai, Yueyi Zhang et al.
Exploiting Polarized Material Cues for Robust Car Detection
Wen Dong, Haiyang Mei, Ziqi Wei et al.
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi et al.
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.
Multimodal Prototyping for cancer survival prediction
Andrew Song, Richard Chen, Guillaume Jaume et al.
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu, Xiaolong Wang, Tai Wang et al.
Predictive Dynamic Fusion
Bing Cao, Yinan Xia, Yi Ding et al.
Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao et al.