Hand pose estimation has gained significant interest recently, leading to the development of various methods. Existing methods attempt to bit performance but often face efficiency challenges. In this work, we propose a lightweight graph-based network optimized for both accuracy and efficiency in 3D single- hand pose estimation. Our design leverages Chebyshev Graph Convolutions (ChebGConv) to streamline the 2D encoding process, reducing computational overhead. Additionally, we introduce a coarse-to-fine ChebGConv module in the 3D decoder to progressively refine the hand mesh reconstruction, enhancing accuracy. We also improve our model through ensemble distillation, transferring knowledge from high- performing teacher models. Notably efficient, our model has only 8.48M parameters and requires 1.7G FLOPs, achieving 55 FPS on a CPU and 109 FPS on a GPU. Despite its lightweight nature, our model demonstrates competitive accuracy, achieving a PA-MPJPE of 5.7mm and a PA-MPVPE of 5.9mm on the FreiHAND dataset, and a PA-MPJPE of 8.7mm and a PA-MPVPE of 8.9mm on the HO3D dataset.
Publisher
Ulsan National Institute of Science and Technology