Disentangled 3D Scene Generation with Layout Learning

31
citations
#438
in ICML 2024
of 2635 papers
5
Top Authors
4
Data Points

Abstract

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs---each representing its own object---along with aset of layoutsthat composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.

Citation History

Jan 28, 2026
0
Feb 13, 2026
31+31
Feb 13, 2026
31
Feb 13, 2026
31