"audio-visual large language models" Papers
3 papers found
Conference
Aligned Better, Listen Better for Audio-Visual Large Language Models
Yuxin Guo, Shuailei Ma, Shijie Ma et al.
ICLR 2025oralarXiv:2504.02061
9
citations
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing
Mingfei Chen, Zijun Cui, Xiulong Liu et al.
NEURIPS 2025oralarXiv:2506.05414
5
citations
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Guangzhi Sun, Wenyi Yu, Changli Tang et al.
ICML 2024oralarXiv:2406.15704
76
citations