File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Image-free domain generalization via CLIP for 3D hand pose estimation

Alternative Title
3D 손 포즈 추정을 위한 CLIP을 통한 이미지 없는 도메인 일반화
Author(s)
LEE, SEONGYEONG
Advisor
Baek, Seungryul
Issued Date
2022-08
URI
https://scholarworks.unist.ac.kr/handle/201301/73841 http://unist.dcollection.net/common/orgView/200000627563
Abstract
RGB-based 3D hand pose estimation has been successful for decades thanks to large databases and deep learning. However, the domain gap between datasets is not yet complete. For various reasons such as the distribution of background features and the shape of the hand, generalization may not be good when testing on a target domain different from the dataset used for training. To solve this problem, many existing papers tried to reduce the domain gap and improve the performance of 3D hand pose estimation by using the domain adaptation method to generalize well by using additional unconstrained/target domain images. However, in an environment where other additional images are difficult to obtain, there is still a need to solve the problem to reduce the domain gap. In this paper, we present a pose estimation framework that uses only source domain data and uses features to generalize well to many unseen data domains. The core of our algorithm is that we achieve generalization effects in different domains using the image encoder of the CLIP(Contrastive Language-Image Pre-Training), which performs well in image representation by contrastive learning with a large number of image-text pairs. In addition, by using a text encoder to effect additional augmentation, general performance was improved for images that were not seen during training. Consequently, in experiments with two datasets STB and RHD representing challenging real-world scenarios, our algorithm shows improved performance that outperform and better generalized compared to the state-of-the-art.
Publisher
Ulsan National Institute of Science and Technology (UNIST)
Degree
Master
Major
Department of Computer Science and Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.