OminiControl: Minimal and Universal Control for Diffusion Transformer

225citations
arXiv:2411.15098
225
citations
#6
in ICCV 2025
of 2701 papers
5
Top Authors
8
Data Points

Abstract

We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures. Current image conditioning methods either introduce substantial parameter overhead or handle only specific control tasks effectively, limiting their practical versatility. OminiControl addresses these limitations through three key innovations: (1) a minimal architectural design that leverages the DiT's own VAE encoder and transformer blocks, requiring just 0.1% additional parameters; (2) a unified sequence processing strategy that combines condition tokens with image tokens for flexible token interactions; and (3) a dynamic position encoding mechanism that adapts to both spatially-aligned and non-aligned control tasks. Our extensive experiments show that this streamlined approach not only matches but surpasses the performance of specialized methods across multiple conditioning tasks. To overcome data limitations in subject-driven generation, we also introduce Subjects200K, a large-scale dataset of identity-consistent image pairs synthesized using DiT models themselves. This work demonstrates that effective image control can be achieved without architectural complexity, opening new possibilities for efficient and versatile image generation systems.

Citation History

Jan 26, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Jan 31, 2026
214+214
Feb 6, 2026
218+4
Feb 13, 2026
225+7
Feb 13, 2026
225
Feb 13, 2026
225