Towards Fine-Grained Text Generation from 3D Hand Geometry for Hand Mesh Reconstruction

Scholarworks@UNIST

UNIST Library

There are no files associated with this item.

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Towards Fine-Grained Text Generation from 3D Hand Geometry for Hand Mesh Reconstruction

URI: https://scholarworks.unist.ac.kr/handle/201301/88293 http://unist.dcollection.net/common/orgView/200000904128

Abstract: While multimodal research has progressed rapidly with Vision-Language Models (VLMs) like LLaVA and GPT-4, linking 3D hand geometry to natural language remains largely uncharted territory. Existing VLMs struggle to capture joint-specific details of hand poses, primarily due to the absence of fine- grained datasets that articulate the intricacies of 3D hand structures. Bridging this gap could unlock a range of valuable applications, including the generation of posed hands for animation, improved remote physical therapy for form correction, and beyond. To address this need, this work introduces a novel framework that generates precise, joint-level text annotations from 3D hand data through an automatic, geometry-based captioning pipeline, establishing a bridge between hand geometry and natural language descriptions. In the experiments presented in this work, using two publicly available 3D hand datasets, these joint-level captions significantly enhanced reconstruction accuracy, validating the robustness of the proposed approach as the first multimodal hand mesh reconstruction model. This framework advances the capabilities of VLM-driven 3D hand representation and sets the stage for more nuanced multimodal applications.

qrcode

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.