"3d visual grounding" Papers
14 papers found
Conference
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Xinyi Wang, Na Zhao, Zhiyuan Han et al.
AAAI 2025paperarXiv:2501.09428
6
citations
Beyond Human Perception: Understanding Multi-Object World from Monocular View
Keyu Guo, Yongle Huang, Shijie Sun et al.
CVPR 2025
2
citations
CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs
Jinpeng Li, Haiping Wang, Jiabin chen et al.
ICLR 2025
5
citations
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng et al.
ICLR 2025arXiv:2505.04965
8
citations
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.
NEURIPS 2025arXiv:2506.04897
1
citations
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
CVPR 2025arXiv:2504.09623
4
citations
Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding
Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.
NEURIPS 2025
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
Rong Li, Shijie Li, Lingdong Kong et al.
CVPR 2025arXiv:2412.04383
43
citations
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.
NEURIPS 2025arXiv:2506.21924
3
citations
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Ozan Unal, Christos Sakaridis, Suman Saha et al.
ECCV 2024arXiv:2309.04561
29
citations
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
AAAI 2024paperarXiv:2312.08022
36
citations
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng, Joy Hsu, Weiyu Liu et al.
CVPR 2024arXiv:2404.19696
9
citations
ScanERU: Interactive 3D Visual Grounding Based on Embodied Reference Understanding
Ziyang Lu, Yunqiang Pei, Guoqing Wang et al.
AAAI 2024paperarXiv:2303.13186
12
citations
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhenxiang Lin, Xidong Peng, peishan cong et al.
ECCV 2024arXiv:2304.05645
13
citations