| dc.contributor.advisor |
Baek, Seungryul |
- |
| dc.contributor.author |
Tajoar, Chowdhury Mubarrat |
- |
| dc.date.accessioned |
2025-09-29T11:31:30Z |
- |
| dc.date.available |
2025-09-29T11:31:30Z |
- |
| dc.date.issued |
2025-08 |
- |
| dc.description.abstract |
While multimodal research has progressed rapidly with Vision-Language Models (VLMs) like LLaVA and GPT-4, linking 3D hand geometry to natural language remains largely uncharted territory. Existing VLMs struggle to capture joint-specific details of hand poses, primarily due to the absence of fine- grained datasets that articulate the intricacies of 3D hand structures. Bridging this gap could unlock a range of valuable applications, including the generation of posed hands for animation, improved remote physical therapy for form correction, and beyond. To address this need, this work introduces a novel framework that generates precise, joint-level text annotations from 3D hand data through an automatic, geometry-based captioning pipeline, establishing a bridge between hand geometry and natural language descriptions. In the experiments presented in this work, using two publicly available 3D hand datasets, these joint-level captions significantly enhanced reconstruction accuracy, validating the robustness of the proposed approach as the first multimodal hand mesh reconstruction model. This framework advances the capabilities of VLM-driven 3D hand representation and sets the stage for more nuanced multimodal applications. |
- |
| dc.description.degree |
Master |
- |
| dc.description |
Department of Computer Science and Engineering |
- |
| dc.identifier.uri |
https://scholarworks.unist.ac.kr/handle/201301/88293 |
- |
| dc.identifier.uri |
http://unist.dcollection.net/common/orgView/200000904128 |
- |
| dc.language |
ENG |
- |
| dc.publisher |
Ulsan National Institute of Science and Technology |
- |
| dc.rights.embargoReleaseDate |
9999-12-31 |
- |
| dc.rights.embargoReleaseTerms |
9999-12-31 |
- |
| dc.subject |
3D-Hand-Reconstruction, Multimodal-Hand-Reconstruction, Text-based-3D-Hand-Pose-Dataset |
- |
| dc.title |
Towards Fine-Grained Text Generation from 3D Hand Geometry for Hand Mesh Reconstruction |
- |
| dc.type |
Thesis |
- |