Poster "multimodal scene understanding" Papers
2 papers found
Conference
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
CVPR 2025arXiv:2504.09623
4
citations
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
ICCV 2025arXiv:2504.18406