Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots

Lee, Kyungjae; Kim, Sungyub; Lim, Sungbin; Choi, Sungjoon; Hong, Mineui; Kim, Jaein; Park, Yong-Lae; Oh, Songhwai

doi:10.15607/rss.2020.xvi.036

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots

Author(s): Lee, Kyungjae, Kim, Sungyub, Lim, Sungbin, Choi, Sungjoon, Hong, Mineui, Kim, Jaein, Park, Yong-Lae, Oh, Songhwai

Issued Date: 2020-07-15

DOI: 10.15607/rss.2020.xvi.036

URI: https://scholarworks.unist.ac.kr/handle/201301/78404

Fulltext: http://www.roboticsproceedings.org/rss16/p036.html

Citation: Robotics: Science and Systems Conference

Abstract: In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems such as learning a controller for soft mobile robot, we also propose a Tsallis actor-critic (TAC). For a different type of RL problems, we find that a different value of the entropic index is desirable and empirically show that TAC with a proper entropic index outperforms the state-of-the-art actor-critic methods. Furthermore, to alleviate the effort for finding the proper entropic index, we propose a linear scheduling method where an entropic index linearly increases as the number of interactions increases. In simulations, the linear scheduling shows the fast convergence speed and a similar performance to TAC with the optimal entropic index, which is a useful property for real robot applications. We also apply TAC with the linear scheduling to learn a feedback controller of a soft mobile robot and shows the best performance compared to other existing actor critic methods in terms of convergence speed and the sum of rewards. Consequently, we empirically show that the proposed method efficiently learns a controller of soft mobile robots.

Publisher: Robotics: Science and Systems Foundation

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.