File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Advancing Motion–Based Human–Robot Interaction: Data Collection and Representation Strategies

Author(s)
Park, Soogeun
Advisor
Joo, Kyungdon
Issued Date
2026-02
URI
https://scholarworks.unist.ac.kr/handle/201301/91049 http://unist.dcollection.net/common/orgView/200000965492
Abstract
Recent advances in human–robot interaction highlight the growing importance of understanding and generating human motion for intelligent humanoid control. This thesis presents two studies that address these challenges from the perspectives of data collection and motion representation.

First, a data generation framework for learning-based human-to-humanoid motion retargeting is proposed. Traditional approaches rely on expensive equipment or handcrafted mappings, which limit their scalability and generality across robot morphologies. To overcome these limitations, this work introduces a reverse-wise data pairing method that generates robot-side poses within feasible pose domains and reconstructs corresponding human poses while filtering out physically invalid poses. This approach produces diverse and high-quality paired datasets suitable for deep learning. Using these data, a two-stage motion retargeting network is trained via supervised-learning, achieving improved accuracy in the predicted positions of robot links compared to a unsupervised baseline, while also generating more natural robot motions in qualitative evaluations. In addition, ablation studies demonstrate the extreme-pose filtering during data generation and confirm that the proposed two-stage architecture is well suited for motion retargeting.

Second, a keyframe-based motion tokenization framework is presented as an alternative to the conventional 1D-convolutional VQ-VAE tokenizers used in transformer-based motion models. Instead of encoding motion through fixed receptive fields, proposed method constructs discrete tokens from keyframes that explicitly encode pose, velocity, and duration information, ensuring clear traceability to the original motion. To enable end-to-end learning, we introduce a differentiable soft interpolation method and develop a Keyframe Motion VQ-VAE that quantizes keyframe segments and reconstructs full-length motion through duration prediction, recover-length expansion, and convolutional decoding. Experiments on the HumanML3D dataset show that our keyframe selection method outperforms random and uniform sampling in preserving semantic structure.

Together, these studies explore alternative approaches to addressing the challenges of data acquisition and motion representation in deep learning–based human–robot interaction. Moreover, by integrating the two approaches, this research suggests the potential for semantic motion retargeting, in which robots may reproduce human motion with preserved meaning as well as physical fidelity.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence Artificial Intelligence

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.