OminiControl: Minimal and Universal Control for Diffusion Transformer

225citations

arXiv:2411.15098

225

citations

in ICCV 2025

of 2701 papers

Top Authors

Data Points

Top Authors

Zhenxiong Tan Songhua Liu Xingyi Yang Qiaochu Xue Xinchao Wang

Topics

diffusion transformer image conditioning subject-driven generation unified sequence processing dynamic position encoding minimal architectural design large-scale dataset synthesis

Abstract

We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures. Current image conditioning methods either introduce substantial parameter overhead or handle only specific control tasks effectively, limiting their practical versatility. OminiControl addresses these limitations through three key innovations: (1) a minimal architectural design that leverages the DiT's own VAE encoder and transformer blocks, requiring just 0.1% additional parameters; (2) a unified sequence processing strategy that combines condition tokens with image tokens for flexible token interactions; and (3) a dynamic position encoding mechanism that adapts to both spatially-aligned and non-aligned control tasks. Our extensive experiments show that this streamlined approach not only matches but surpasses the performance of specialized methods across multiple conditioning tasks. To overcome data limitations in subject-driven generation, we also introduce Subjects200K, a large-scale dataset of identity-consistent image pairs synthesized using DiT models themselves. This work demonstrates that effective image control can be achieved without architectural complexity, opening new possibilities for efficient and versatile image generation systems.

Citation History

Jan 26, 2026

Jan 27, 2026

Jan 31, 2026

214+214

Feb 6, 2026

218+4

Feb 13, 2026

225+7

Feb 13, 2026

225

Feb 13, 2026

225