Advancing Motion–Based Human–Robot Interaction: Data Collection and Representation Strategies

Park, Soogeun

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Advancing Motion–Based Human–Robot Interaction: Data Collection and Representation Strategies

Author(s): Park, Soogeun

Advisor: Joo, Kyungdon

Issued Date: 2026-02

URI: https://scholarworks.unist.ac.kr/handle/201301/91049 http://unist.dcollection.net/common/orgView/200000965492

Abstract: Recent advances in human–robot interaction highlight the growing importance of understanding and generating human motion for intelligent humanoid control. This thesis presents two studies that address these challenges from the perspectives of data collection and motion representation.

First, a data generation framework for learning-based human-to-humanoid motion retargeting is proposed. Traditional approaches rely on expensive equipment or handcrafted mappings, which limit their scalability and generality across robot morphologies. To overcome these limitations, this work introduces a reverse-wise data pairing method that generates robot-side poses within feasible pose domains and reconstructs corresponding human poses while filtering out physically invalid poses. This approach produces diverse and high-quality paired datasets suitable for deep learning. Using these data, a two-stage motion retargeting network is trained via supervised-learning, achieving improved accuracy in the predicted positions of robot links compared to a unsupervised baseline, while also generating more natural robot motions in qualitative evaluations. In addition, ablation studies demonstrate the extreme-pose filtering during data generation and confirm that the proposed two-stage architecture is well suited for motion retargeting.

Second, a keyframe-based motion tokenization framework is presented as an alternative to the conventional 1D-convolutional VQ-VAE tokenizers used in transformer-based motion models. Instead of encoding motion through fixed receptive fields, proposed method constructs discrete tokens from keyframes that explicitly encode pose, velocity, and duration information, ensuring clear traceability to the original motion. To enable end-to-end learning, we introduce a differentiable soft interpolation method and develop a Keyframe Motion VQ-VAE that quantizes keyframe segments and reconstructs full-length motion through duration prediction, recover-length expansion, and convolutional decoding. Experiments on the HumanML3D dataset show that our keyframe selection method outperforms random and uniform sampling in preserving semantic structure.

Together, these studies explore alternative approaches to addressing the challenges of data acquisition and motion representation in deep learning–based human–robot interaction. Moreover, by integrating the two approaches, this research suggests the potential for semantic motion retargeting, in which robots may reproduce human motion with preserved meaning as well as physical fidelity.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Graduate School of Artificial Intelligence Artificial Intelligence

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.