Poster "3d scene understanding" Papers

55 papers found • Page 1 of 2

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

Hengshuo Chu, Xiang Deng, Qi Lv et al.

ICLR 2025arXiv:2502.20041
16
citations

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025arXiv:2406.05132
30
citations

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025arXiv:2412.18450
11
citations

All in One: Visual-Description-Guided Unified Point Cloud Segmentation

Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong et al.

ICCV 2025arXiv:2507.05211
1
citations

An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models

Wentao Qu, Jing Wang, Yongshun Gong et al.

CVPR 2025arXiv:2411.16308
9
citations

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann et al.

CVPR 2025arXiv:2410.13924
6
citations

ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis

Yun Chang, Leonor Fermoselle, Duy Ta et al.

CVPR 2025arXiv:2504.06553
4
citations

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025arXiv:2408.00754
9
citations

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025arXiv:2406.19353
26
citations

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu et al.

NEURIPS 2025arXiv:2510.20238
1
citations

DiSCO-3D : Discovering and Segmenting Sub-Concepts from Open-vocabulary Queries in NeRF

Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile et al.

ICCV 2025arXiv:2507.14596
1
citations

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Chandan Yeshwanth, David Rozenberszki, Angela Dai

ICCV 2025arXiv:2503.17044
3
citations

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025arXiv:2502.04144
40
citations

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Wanhua Li, Yujie Zhao, Minghan Qin et al.

NEURIPS 2025arXiv:2507.07136
8
citations

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NEURIPS 2025arXiv:2505.24625
29
citations

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

Pedro Hermosilla, Christian Stippel, Leon Sick

CVPR 2025arXiv:2504.06719

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

ICLR 2025arXiv:2405.18213
9
citations

Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation

Akshay Krishnan, Xinchen Yan, Vincent Casser et al.

ICCV 2025arXiv:2501.13087
8
citations

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.

CVPR 2025arXiv:2404.10620
4
citations

Reasoning Beyond Points: A Visual Introspective Approach for Few-Shot 3D Segmentation

Changshuo Wang, Shuting He, Xiang Fang et al.

NEURIPS 2025

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025arXiv:2506.04308
58
citations

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025arXiv:2504.01901
33
citations

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion

AO LI, Jinpeng Liu, Yixuan Zhu et al.

ICCV 2025arXiv:2509.07920

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025arXiv:2410.03878
20
citations

Spatially-aware Weights Tokenization for NeRF-Language Models

Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

NEURIPS 2025

SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding

Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.

NEURIPS 2025arXiv:2506.21924
3
citations

Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Dekai Zhu, Yixuan Hu, Youquan Liu et al.

NEURIPS 2025arXiv:2505.22643
5
citations

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia, Yunxiang Lu, Rui Song et al.

ICCV 2025arXiv:2412.10308
1
citations

Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

jusheng zhang, Yijia Fan, Zimo Wen et al.

NEURIPS 2025

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Duo Zheng, Shijia Huang, Liwei Wang

CVPR 2025arXiv:2412.00493
70
citations

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025arXiv:2502.06787
31
citations

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025arXiv:2506.22799
3
citations

WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild

Morris Alper, David Novotny, Filippos Kokkinos et al.

NEURIPS 2025arXiv:2506.13030
1
citations

An Embodied Generalist Agent in 3D World

Jiangyong Huang, Silong Yong, Xiaojian Ma et al.

ICML 2024arXiv:2311.12871
305
citations

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.

ECCV 2024arXiv:2310.09739
15
citations

ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images

Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou et al.

ECCV 2024arXiv:2408.17027
8
citations

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.

ECCV 2024arXiv:2407.09781
11
citations

DORSal: Diffusion for Object-centric Representations of Scenes $\textit{et al.}$

Allan Jabri, Sjoerd van Steenkiste, Emiel Hoogeboom et al.

ICLR 2024

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

Quan Liu, Hongzi Zhu, Zhenxi Wang et al.

CVPR 2024arXiv:2403.03532
22
citations

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024arXiv:2405.07784
48
citations

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Chengyao Wang, Li Jiang, Xiaoyang Wu et al.

CVPR 2024arXiv:2403.09639
26
citations

Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds

Yanni Ma, Hao Liu, Yun Pei et al.

ECCV 2024
3
citations

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

Bohan Li, Jiajun Deng, Wenyao Zhang et al.

ECCV 2024arXiv:2407.02077
33
citations

Instance Tracking in 3D Scenes from Egocentric Videos

Yunhan Zhao, Haoyu Ma, Shu Kong et al.

CVPR 2024arXiv:2312.04117
12
citations

Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Wen Yuan Zhang, Kanle Shi, Yushen Liu et al.

ECCV 2024

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

Mingsheng Li, Xin Chen, Chi Zhang et al.

ECCV 2024
4
citations

Neural Volumetric World Models for Autonomous Driving

Zanming Huang, Jimuyang Zhang, Eshed Ohn-Bar

ECCV 2024
14
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024arXiv:2309.00616
83
citations

Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

Pengfei Wang, Yuxi Wang, Shuai Li et al.

ECCV 2024arXiv:2407.13362
10
citations

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Xiaoyu Zhu, Hao Zhou, Pengfei Xing et al.

ECCV 2024arXiv:2407.13642
11
citations
PreviousNext