"3d visual grounding" Papers

14 papers found

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Xinyi Wang, Na Zhao, Zhiyuan Han et al.

AAAI 2025paperarXiv:2501.09428
6
citations

Beyond Human Perception: Understanding Multi-Object World from Monocular View

Keyu Guo, Yongle Huang, Shijie Sun et al.

CVPR 2025
2
citations

CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs

Jinpeng Li, Haiping Wang, Jiabin chen et al.

ICLR 2025
5
citations

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

ICLR 2025arXiv:2505.04965
8
citations

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.

NEURIPS 2025arXiv:2506.04897
1
citations

Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding

Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.

CVPR 2025arXiv:2504.09623
4
citations

Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding

Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.

NEURIPS 2025

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383
43
citations

SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding

Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.

NEURIPS 2025arXiv:2506.21924
3
citations

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

Ozan Unal, Christos Sakaridis, Suman Saha et al.

ECCV 2024arXiv:2309.04561
29
citations

Mono3DVG: 3D Visual Grounding in Monocular Images

Yangfan Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024paperarXiv:2312.08022
36
citations

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Chun Feng, Joy Hsu, Weiyu Liu et al.

CVPR 2024arXiv:2404.19696
9
citations

ScanERU: Interactive 3D Visual Grounding Based on Embodied Reference Understanding

Ziyang Lu, Yunqiang Pei, Guoqing Wang et al.

AAAI 2024paperarXiv:2303.13186
12
citations

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

Zhenxiang Lin, Xidong Peng, peishan cong et al.

ECCV 2024arXiv:2304.05645
13
citations