"cross-modal tasks" Papers
2 papers found
Conference
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
Jie Feng, Shengyuan Wang, Tianhui Liu et al.
ICCV 2025arXiv:2506.23219
11
citations
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen, Kumar Ashutosh, Rohit Girdhar et al.
CVPR 2024arXiv:2404.05206
12
citations