Poster "audio-visual question answering" Papers
2 papers found
Conference
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu, Yiquan Li, Khoi D Nguyen et al.
CVPR 2025arXiv:2503.19794
1
citations
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao et al.
ECCV 2024arXiv:2403.04640
50
citations