"multimodal integration" Papers
15 papers found
Conference
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park, Kuk Jin Jang, Basam Alasaly et al.
AAAI 2025paperarXiv:2408.12763
16
citations
Asymmetric Reinforcing Against Multi-Modal Representation Bias
Xiyuan Gao, Bing Cao, Pengfei Zhu et al.
AAAI 2025paperarXiv:2501.01240
5
citations
Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect
Tom Kouwenhoven, Kiana Shahrasbi, Tessa Verhoef
NEURIPS 2025arXiv:2507.10013
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang, Haihong E, Jiacheng Liu et al.
ICCV 2025arXiv:2508.04625
6
citations
FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video
Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado et al.
CVPR 2025highlightarXiv:2503.23094
1
citations
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models
Jenny Schmalfuss, Nadine Chang, Vibashan VS et al.
CVPR 2025arXiv:2506.14808
1
citations
Position: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift
Eunsu Baek, Keondo Park, Jeonggil Ko et al.
NEURIPS 2025
3
citations
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
Sheng Liu, Haotian Ye, James Y Zou
ICLR 2025
29
citations
scGeneScope: A Treatment-Matched Single Cell Imaging and Transcriptomics Dataset and Benchmark for Treatment Response Modeling
Joel Dapello, Marcel Nassar, Ridvan Eksi et al.
NEURIPS 2025
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng, Yun Xing, Zesen Cheng et al.
NEURIPS 2025arXiv:2410.12787
30
citations
The Indra Representation Hypothesis
Jianglin Lu, Hailing Wang, Kuo Yang et al.
NEURIPS 2025
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen, Jianfei Yang
ICLR 2025arXiv:2410.10167
11
citations
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.
ICCV 2025arXiv:2503.06273
5
citations
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto, Marcella Cornia, Lorenzo Baraldi et al.
ECCV 2024arXiv:2407.20341
12
citations
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
Peipei Liu, Hong Li, Yimo Ren et al.
AAAI 2024paperarXiv:2305.08372
8
citations