"multimodal integration" Papers

15 papers found

Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models

Jean Park, Kuk Jin Jang, Basam Alasaly et al.

AAAI 2025paperarXiv:2408.12763
16
citations

Asymmetric Reinforcing Against Multi-Modal Representation Bias

Xiyuan Gao, Bing Cao, Pengfei Zhu et al.

AAAI 2025paperarXiv:2501.01240
5
citations

Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect

Tom Kouwenhoven, Kiana Shahrasbi, Tessa Verhoef

NEURIPS 2025arXiv:2507.10013

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Zichen Tang, Haihong E, Jiacheng Liu et al.

ICCV 2025arXiv:2508.04625
6
citations

FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video

Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado et al.

CVPR 2025highlightarXiv:2503.23094
1
citations

PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models

Jenny Schmalfuss, Nadine Chang, Vibashan VS et al.

CVPR 2025arXiv:2506.14808
1
citations

Position: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

Eunsu Baek, Keondo Park, Jeonggil Ko et al.

NEURIPS 2025
3
citations

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Sheng Liu, Haotian Ye, James Y Zou

ICLR 2025
29
citations

scGeneScope: A Treatment-Matched Single Cell Imaging and Transcriptomics Dataset and Benchmark for Treatment Response Modeling

Joel Dapello, Marcel Nassar, Ridvan Eksi et al.

NEURIPS 2025

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Sicong Leng, Yun Xing, Zesen Cheng et al.

NEURIPS 2025arXiv:2410.12787
30
citations

The Indra Representation Hypothesis

Jianglin Lu, Hailing Wang, Kuo Yang et al.

NEURIPS 2025

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025arXiv:2410.10167
11
citations

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.

ICCV 2025arXiv:2503.06273
5
citations

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Sara Sarto, Marcella Cornia, Lorenzo Baraldi et al.

ECCV 2024arXiv:2407.20341
12
citations

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts

Peipei Liu, Hong Li, Yimo Ren et al.

AAAI 2024paperarXiv:2305.08372
8
citations