LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes

13citations
arXiv:2410.14462
13
citations
#273
in ICCV 2025
of 2701 papers
5
Top Authors
7
Data Points

Abstract

We address the problem of extending the capabilities of vision foundation models such as DINO, SAM, and CLIP, to 3D tasks. Specifically, we introduce a novel method to uplift 2D image features into Gaussian Splatting representations of 3D scenes. Unlike traditional approaches that rely on minimizing a reconstruction loss, our method employs a simpler and more efficient feature aggregation technique, augmented by a graph diffusion mechanism. Graph diffusion refines 3D features, such as coarse segmentation masks, by leveraging 3D geometry and pairwise similarities induced by DINOv2. Our approach achieves performance comparable to the state of the art on multiple downstream tasks while delivering significant speed-ups. Notably, we obtain competitive segmentation results using only generic DINOv2 features, despite DINOv2 not being trained on millions of annotated segmentation masks like SAM. When applied to CLIP features, our method demonstrates strong performance in open-vocabulary object segmentation tasks, highlighting the versatility of our approach.

Citation History

Jan 26, 2026
12
Jan 26, 2026
12
Jan 27, 2026
12
Feb 3, 2026
12
Feb 13, 2026
13+1
Feb 13, 2026
13
Feb 13, 2026
13