Stable Virtual Camera: Generative View Synthesis with Diffusion Models

87citations

arXiv:2503.14489

citations

#20

in ICCV 2025

of 2701 papers

Top Authors

Data Points

Top Authors

Jensen Zhou Hang Gao Vikram Voleti Aaryaman Vasishta Chun-Han Yao Mark Boss Philip Torr Christian Rupprecht Varun Jampani

Topics

view synthesis diffusion models novel view generation camera pose conditioning video generation temporal consistency 3d scene representation zero-shot generalization

Abstract

We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene, given any number of input views and target cameras. Existing works struggle to generate either large viewpoint changes or temporally smooth samples, while relying on specific task configurations. Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy that generalize across view synthesis tasks at test time. As a result, our samples maintain high consistency without requiring additional 3D representation-based distillation, thus streamlining view synthesis in the wild. Furthermore, we show that our method can generate high-quality videos lasting up to half a minute with seamless loop closure. Extensive benchmarking demonstrates that Seva outperforms existing methods across different datasets and settings. Project page with code and model: https://stable-virtual-camera.github.io/.

Citation History

Jan 25, 2026

Jan 31, 2026

83+5

Feb 13, 2026

87+4

Feb 13, 2026