| dc.description.abstract |
We present HOReeNet, which tackles the novel task of manipulating images involving hands, objects, and their interactions. Especially, we are interested in transferring objects of source images to target images and manipulating 3D hand postures to tightly grasp the transferred objects. Furthermore, the manipulation needs to be reflected in the 2D image space. In the facial or body pose reenactment al- gorithms, a 3D reconstruction module was involved; however, they are less pronounced as 2D-based approaches such as generative adversarial network (GAN) reasonably worked for transferring facial expressions/human body poses from source to target images. On the contrary, in our novel reenact- ment scenario involving hand-object interactions, 3D reconstruction becomes essential as 3D contact reasoning between hands and objects is required to achieve a tight grasp. At the same time, to obtain high-quality 2D images from 3D space, well-designed 3D-to-2D projection and 2D image refinement modules are required. Our HOReeNet is the first fully differentiable framework proposed for such a task. On two well-known hand-object interaction datasets (i.e., Honnotate and DexYCB), we compared our HOReeNet to the conventional image translation algorithms (i.e., CycleGAN and U-GAT-IT) and reenactment algorithm (i.e., ReenactGAN). Through thorough experiments, we demonstrated that our approach could achieved the state-of-the-art performance on the proposed task. The code will be pub- licly available. |
- |