File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

안혜민

Ahn, Hyemin
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Proprioception-conditioned Visual Scene Generation for Robot World Modeling via Contrastive Learning and Diffusion

Alternative Title
고유 감각 정보 기반 시각적 장면 생성을 통한 로봇 세계 모델링을 가능케하는 대조 학습 및 디퓨전 모델
Author(s)
Kim, Seong HyeonAhn, Hyemin
Issued Date
2025-06
DOI
10.5302/J.ICROS.2025.25.0050
URI
https://scholarworks.unist.ac.kr/handle/201301/91452
Fulltext
https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE12246460
Citation
Journal of Institute of Control, Robotics and Systems, v.31, no.6, pp.581 - 585
Abstract
A world model allows robots to understand and predict the interplay between their actions and environmental dynamics. Recent advancements in diffusion models have significantly improved the quality of image frame generation in simulated environments, contributing to the development of more robust and generalized world models. However, these diffusion-based world models often depend on discrete inputs, such as keyboard commands, which limit their applicability to continuous real-world robotic control. To address this limitation, we propose a novel framework that integrates contrastive learning to align visual and proprioceptive modalities (e.g., joint positions) within a shared latent space. This shared latent space facilitates accurate cross-modal predictions between visual scenes and proprioceptive states. By combining this latent representation with a diffusion model, our world model can generate long-term future visual scenes by leveraging both initial visual observations and proprioceptive states. Experimental results demonstrate that the proposed framework generates high-fidelity, long-term future visual scenes when provided with target proprioceptive data. This capability enables robots to plan their motions solely based on the generated images, enabling imagination-based planning. © ICROS 2025.
Publisher
Institute of Control, Robotics and Systems
ISSN
1976-5622

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.