File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Towards Fine-Grained Text Generation from 3D Hand Geometry for Hand Mesh Reconstruction

Author(s)
Tajoar, Chowdhury Mubarrat
Advisor
Baek, Seungryul
Issued Date
2025-08
URI
https://scholarworks.unist.ac.kr/handle/201301/88293 http://unist.dcollection.net/common/orgView/200000904128
Abstract
While multimodal research has progressed rapidly with Vision-Language Models (VLMs) like LLaVA and GPT-4, linking 3D hand geometry to natural language remains largely uncharted territory. Existing VLMs struggle to capture joint-specific details of hand poses, primarily due to the absence of fine- grained datasets that articulate the intricacies of 3D hand structures. Bridging this gap could unlock a range of valuable applications, including the generation of posed hands for animation, improved remote physical therapy for form correction, and beyond. To address this need, this work introduces a novel framework that generates precise, joint-level text annotations from 3D hand data through an automatic, geometry-based captioning pipeline, establishing a bridge between hand geometry and natural language descriptions. In the experiments presented in this work, using two publicly available 3D hand datasets, these joint-level captions significantly enhanced reconstruction accuracy, validating the robustness of the proposed approach as the first multimodal hand mesh reconstruction model. This framework advances the capabilities of VLM-driven 3D hand representation and sets the stage for more nuanced multimodal applications.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Department of Computer Science and Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.