GenFusion: Closing the Loop between Reconstruction and Generation via Videos

19citations
arXiv:2503.21219
19
citations
#436
in CVPR 2025
of 2873 papers
5
Top Authors
6
Data Points

Abstract

Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. We found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. To address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach. More details at https://genfusion.sibowu.com.

Citation History

Jan 24, 2026
16
Jan 27, 2026
16
Feb 3, 2026
18+2
Feb 13, 2026
19+1
Feb 13, 2026
19
Feb 13, 2026
19