Self-Supervised Post-Hoc Refinement of Text-to-Motion Models for Physically Plausible Motion Generation

Shim, Gahyeon

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Joo, Kyung-Don	-
dc.contributor.author	Shim, Gahyeon	-
dc.date.accessioned	2026-03-26T22:15:26Z	-
dc.date.available	2026-03-26T22:15:26Z	-
dc.date.issued	2026-02	-
dc.description.abstract	Recent advances in text-to-motion generation have demonstrated impressive capabilities in producing diverse and semantically coherent human motions from natural language descriptions. However, the generated motions often suffer from physical implausibility, such as foot skating, floating, ground penetration, or foot clipping, which undermines their realism and applicability in embodied environments. Addressing these issues typically requires complex physical modeling, which limits scalability and generalization across models. This thesis proposes the Distortion-aware Motion Calibrator (DMC), a lightweight and model-agnostic post-hoc framework designed to enhance the physical plausibility and semantic alignment of generated motions without modifying the base architecture. DMC adopts a self-supervised distortion–refinement paradigm, where synthetically distorted motion and clean motion are used as training pairs to learn the correction of diverse motion artifacts. Two complementary variants are introduced: a WGAN-based DMC, which provides semantically faithful refinement with minimal computational overhead, and a denoising-based DMC, which achieves fine-grained physical correction of contact-related artifacts through iterative refinement. Extensive experiments on T2M, T2M-GPT, and MoMask show that both variants of DMC consistently enhance the physical realism and semantic coherence of generated motions. The WGAN-based DMC provides efficient perceptual correction, while the denoising-based variant offers higher precision with greater computational cost. Ablation analyses reveal the influence of refinement steps, textual conditioning, and distortion types on balancing semantic fidelity and physical fidelity. Overall, DMC bridges the gap between perceptual quality and physical realism in motion generation, serving as a plug-and-play, self-supervised refinement framework applicable to animation, virtual agents, and embodied robotics.	-
dc.description.degree	Master	-
dc.description	Graduate School of Artificial Intelligence Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/91059	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000965991	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.subject	Seawater battery	-
dc.title	Self-Supervised Post-Hoc Refinement of Text-to-Motion Models for Physically Plausible Motion Generation	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.