File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.contributor.advisor Baek, Seungryul -
dc.contributor.author Tajoar, Chowdhury Mubarrat -
dc.date.accessioned 2025-09-29T11:31:30Z -
dc.date.available 2025-09-29T11:31:30Z -
dc.date.issued 2025-08 -
dc.description.abstract While multimodal research has progressed rapidly with Vision-Language Models (VLMs) like LLaVA and GPT-4, linking 3D hand geometry to natural language remains largely uncharted territory. Existing VLMs struggle to capture joint-specific details of hand poses, primarily due to the absence of fine- grained datasets that articulate the intricacies of 3D hand structures. Bridging this gap could unlock a range of valuable applications, including the generation of posed hands for animation, improved remote physical therapy for form correction, and beyond. To address this need, this work introduces a novel framework that generates precise, joint-level text annotations from 3D hand data through an automatic, geometry-based captioning pipeline, establishing a bridge between hand geometry and natural language descriptions. In the experiments presented in this work, using two publicly available 3D hand datasets, these joint-level captions significantly enhanced reconstruction accuracy, validating the robustness of the proposed approach as the first multimodal hand mesh reconstruction model. This framework advances the capabilities of VLM-driven 3D hand representation and sets the stage for more nuanced multimodal applications. -
dc.description.degree Master -
dc.description Department of Computer Science and Engineering -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/88293 -
dc.identifier.uri http://unist.dcollection.net/common/orgView/200000904128 -
dc.language ENG -
dc.publisher Ulsan National Institute of Science and Technology -
dc.rights.embargoReleaseDate 9999-12-31 -
dc.rights.embargoReleaseTerms 9999-12-31 -
dc.subject 3D-Hand-Reconstruction, Multimodal-Hand-Reconstruction, Text-based-3D-Hand-Pose-Dataset -
dc.title Towards Fine-Grained Text Generation from 3D Hand Geometry for Hand Mesh Reconstruction -
dc.type Thesis -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.