Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Cha, Junuk; Kim, Jihyeon; Yoon, Jae Shin; Baek, Seungryul

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

백승렬

Baek, Seungryul: UNIST VISION AND LEARNING LAB.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	US	-
dc.citation.title	IEEE Conference on Computer Vision and Pattern Recognition	-
dc.contributor.author	Cha, Junuk	-
dc.contributor.author	Kim, Jihyeon	-
dc.contributor.author	Yoon, Jae Shin	-
dc.contributor.author	Baek, Seungryul	-
dc.date.accessioned	2024-12-27T15:05:07Z	-
dc.date.available	2024-12-27T15:05:07Z	-
dc.date.created	2024-12-26	-
dc.date.issued	2024-06-19	-
dc.description.abstract	This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and semantics) from text prompts. To address this challenge, we propose to decompose the interaction generation task into two subtasks: hand-object contact generation; and hand-object motion generation. For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object during the interaction. The network learns a variety of local geometry structure of diverse objects that is independent of the objects’ category, and thus, it is applicable to general objects. For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion as a function of text prompts by learning from the augmented labeled dataset; where we annotate text labels from many existing 3D hand and object motion data. Finally, we further introduce a hand refiner module that minimizes the distance between the object surface and hand joints to improve the temporal stability of the objecthand contacts and to suppress the penetration artifacts. In the experiments, we demonstrate that our method can generate more realistic and diverse interactions compared to other baseline methods. We also show that our method is applicable to unseen objects. We will release our model and newly labeled data as a strong foundation for future research. Codes and data are available in: https://github.com/JunukCha/Text2HOI.	-
dc.identifier.bibliographicCitation	IEEE Conference on Computer Vision and Pattern Recognition	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/85287	-
dc.language	영어	-
dc.publisher	IEEE/CVF	-
dc.title	Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2024-06-17	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.