Single-view Image to Novel-view Generation for Hand-Object Interactions

1citations
PDFProject
1
citations
#1733
in AAAI 2025
of 3028 papers
7
Top Authors
2
Data Points

Abstract

Hand-object interaction modeling from a single RGB image is a significantly challenging task. Previous works typically reconstruct hand-object interactions as texture-less meshes, ignoring photo-realistic image generation. In this work, we introduce the HO123, a novel method to synthesize novel-view hand-object interaction images from a single image. To this end, we first train a 2D diffusion prior. Given the camera pose in novel views, our approach transfers the camera information into explicit hand representations, including hand depth and skeleton images. We propose a global hand embedding to control the diffusion model based on these hand representations. We then learn a 3D Gaussian splatting for novel-view rendering using the diffusion prior. However, occluded objects present a persistent challenge. To address this issue, we further introduce local hand embedding, where a contact field is defined in the 3D Gaussian Splatting. We leverage contact information to guide the rendering in the contact field. Extensive experiments on the HO3D and DexYCB datasets demonstrate that our method significantly outperforms state-of-the-art novel-view synthesis for hand-object interactions.

Citation History

Jan 27, 2026
1
Feb 4, 2026
1