Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

14citations

arXiv:2404.11291

citations

#1558

in CVPR 2024

of 2716 papers

Top Authors

Data Points

Top Authors

Buzhen Huang Chen Li Chongyang Xu Liang Pan Yangang Wang Gim Hee Lee

Topics

human reconstruction monocular video proxemic behavior physics guidance diffusion models vector quantised-variational autoencoder inter-person occlusion pose estimation

Abstract

Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occlusion. In view of this, we propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information. This is based on the observation that human interaction has specific patterns following the social proxemics. Specifically, we first design a latent representation based on Vector Quantised-Variational AutoEncoder (VQ-VAE) to model human interaction. A proxemics and physics guided diffusion model is then introduced to denoise the initial distribution. We design the diffusion model as dual branch with each branch representing one individual such that the interaction can be modeled via cross attention. With the learned priors of VQ-VAE and physical constraint as the additional information, our proposed approach is capable of estimating accurate poses that are also proxemics and physics plausible. Experimental results on Hi4D, 3DPW, and CHI3D demonstrate that our method outperforms existing approaches. The code is available at \url{https://github.com/boycehbz/HumanInteraction}.

Citation History

Jan 27, 2026

Feb 13, 2026

14+14

Feb 13, 2026