UniHuman: A Unified Model For Editing Human Images in the Wild

14citations

arXiv:2312.14985

citations

#1558

in CVPR 2024

of 2716 papers

Top Authors

Data Points

Top Authors

Nannan Li Qing Liu Krishna Kumar Singh Yilin Wang Jianming Zhang Bryan A. Plummer Zhe Lin

Topics

human image editing pose warping module text-guided editing unified editing model human visual encoders out-of-domain generalization image-text pairs

Abstract

Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning them jointly. In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings. To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders and introduce a lightweight pose-warping module that can exploit different pose representations, accommodating unseen textures and patterns. Furthermore, to bridge the disparity between existing human editing benchmarks with real-world data, we curated 400K high-quality human image-text pairs for training and collected 2K human images for out-of-domain testing, both encompassing diverse clothing styles, backgrounds, and age groups. Experiments on both in-domain and out-of-domain test sets demonstrate that UniHuman outperforms task-specific models by a significant margin. In user studies, UniHuman is preferred by the users in an average of 77% of cases. Our project is available at https://github.com/NannanLi999/UniHuman.

Citation History

Jan 27, 2026

Feb 13, 2026

14+1

Feb 13, 2026