3D Pose and Shape Estimation for Clothed Multi-Person from a Single Image

Cha, Junuk

Scholarworks@UNIST

UNIST Library

File Download

200000865232.pdf

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Baek, Seungryul	-
dc.contributor.author	Cha, Junuk	-
dc.date.accessioned	2025-04-04T13:48:38Z	-
dc.date.available	2025-04-04T13:48:38Z	-
dc.date.issued	2025-02	-
dc.description.abstract	Estimating 3D poses and shapes in the form of meshes from monocular RGB images is a challenging task, particularly in multi-person scenarios where occlusions and complex interactions introduce significant ambiguities. This paper proposes a novel coarse-to-fine pipeline to address these challenges, extending traditional 3D pose and shape estimation to clothed human mesh reconstruction. By tackling occlusions and missing geometry, the method delivers a comprehensive solution to reconstruct accurate and physically plausible 3D human meshes. The pipeline combines robust 3D skeleton estimation with advanced mesh refinement techniques to achieve significant improvements in scenarios involving multiple interacting individuals. The pipeline begins by estimating occlusion-resistant 3D skeletons for multiple persons from a single RGB image. These skeletons, designed to handle partial visibility, are transformed into deformable 3D mesh parameters through inverse kinematics, providing an initial coarse representation. To refine these meshes, a Transformer-based relation-aware module is employed, which considers both intra-person dynamics (e.g., spatial consistency of body parts) and inter-person interactions (e.g., spatial relations between individuals). This multi-level refinement enhances the realism and accuracy of the resulting meshes, even in complex scenes. To handle clothed human meshes in globally coherent scene spaces, the pipeline addresses critical issues like missing body parts and physical implausibilities, such as self-penetration and person-to-person penetration. Two innovative human priors are introduced to overcome these challenges. The geometry prior uses an encoder-decoder architecture to recover detailed 3D features from incomplete body geometry and combines these with a surface normal map to reconstruct realistic, detailed clothed meshes. The contact prior, on the other hand, employs an image-space contact detector to enforce physical plausibility by estimating and maintaining realistic surface contacts between individuals. Extensive experiments conducted on benchmark datasets, including 3DPW, MuPoTS, AGORA, and MultiHuman, demonstrate the pipeline’s effectiveness. For pose and shape estimation, the method excels in managing occlusions and interactions, while for clothed mesh re- construction, it achieves penetration-free results with detailed textures and surface features. These results confirm the superiority of the approach compared to existing methods, highlighting its ability to handle diverse and challenging scenarios in 3D human mesh reconstruction. In conclusion, this paper presents a significant advancement in the field of 3D human mesh reconstruction from monocular RGB images. By integrating robust skeletal estimation, Transformer-based refinement, and innovative human priors, the proposed pipeline delivers accurate, coherent, and physically plausible clothed human meshes. This work not only addresses long-standing challenges like occlusions and physical implausibilities but also opens new opportunities for applications in virtual reality, gaming, and digital human modeling, setting a new benchmark in multi-person 3D reconstruction.	-
dc.description.degree	Doctor	-
dc.description	Graduate School of Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86407	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000865232	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.subject	3D human reconstruction	-
dc.subject	single image input	-
dc.title	3D Pose and Shape Estimation for Clothed Multi-Person from a Single Image	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.