I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

29citations

arXiv:2312.08869

citations

#963

in CVPR 2024

of 2716 papers

Top Authors

Data Points

Top Authors

Chengfeng Zhao Juze Zhang Jiashen Du Ziwei Shan Junye Wang Jingyi Yu Jingya Wang Lan Xu

Topics

human-object interaction monocular 3d capture inertial measurement unit motion diffusion model motion tracking hybrid sensing motion refinement

Abstract

We are living in a world surrounded by diverse and "smart" devices with rich modalities of sensing ability. Conveniently capturing the interactions between us humans and these objects remains far-reaching. In this paper, we present I'm-HOI, a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting: using a minimal amount of RGB camera and object-mounted Inertial Measurement Unit (IMU). It combines general motion inference and category-aware refinement. For the former, we introduce a holistic human-object tracking method to fuse the IMU signals and the RGB stream and progressively recover the human motions and subsequently the companion object motions. For the latter, we tailor a category-aware motion diffusion model, which is conditioned on both the raw IMU observations and the results from the previous stage under over-parameterization representation. It significantly refines the initial results and generates vivid body, hand, and object motions. Moreover, we contribute a large dataset with ground truth human and object motions, dense RGB inputs, and rich object-mounted IMU measurements. Extensive experiments demonstrate the effectiveness of I'm-HOI under a hybrid capture setting. Our dataset and code will be released to the community.

Citation History

Jan 27, 2026

Feb 7, 2026

29+1

Feb 13, 2026