"multimodal scene understanding" Papers
3 papers found
Conference
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
CVPR 2025arXiv:2504.09623
4
citations
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
ICCV 2025arXiv:2504.18406
Universal Scene Graph Generation
Shengqiong Wu, Hao Fei, Tat-seng Chua
CVPR 2025highlightarXiv:2503.15005
4
citations