InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

377citations

arXiv:2304.03411

377

citations

#40

in CVPR 2024

of 2716 papers

Top Authors

Data Points

Top Authors

Jing Shi Wei Xiong Zhe Lin HyunJoon Jung

Abstract

Recent advances in personalized image generation allow a pre-trained text-to-image model to learn a new concept from a set of images. However, existing personalization approaches usually require heavy test-time finetuning for each concept, which is time-consuming and difficult to scale. We propose InstantBooth, a novel approach built upon pre-trained text-to-image models that enables instant text-guided image personalization without any test-time finetuning. We achieve this with several major components. First, we learn the general concept of the input images by converting them to a textual token with a learnable image encoder. Second, to keep the fine details of the identity, we learn rich visual feature representation by introducing a few adapter layers to the pre-trained model. We train our components only on text-image pairs without using paired images of the same concept. Compared to test-time finetuning-based methods like DreamBooth and Textual-Inversion, our model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster.

Citation History

Jan 28, 2026

369

Feb 13, 2026

377+8

Feb 13, 2026

377