"audio-visual question answering" Papers
5 papers found
Conference
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
AAAI 2025paperarXiv:2403.16276
25
citations
Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Zhangbin Li, Jinxing Zhou, Jing Zhang et al.
AAAI 2025paperarXiv:2412.10749
16
citations
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu, Yiquan Li, Khoi D Nguyen et al.
CVPR 2025arXiv:2503.19794
1
citations
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao et al.
ECCV 2024arXiv:2403.04640
50
citations
Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering
Zhangbin Li, Jinxing Zhou, Dan Guo et al.
AAAI 2024paperarXiv:2312.12816
27
citations