File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots

Author(s)
Lee, KyungjaeKim, SungyubLim, SungbinChoi, SungjoonHong, MineuiKim, JaeinPark, Yong-LaeOh, Songhwai
Issued Date
2020-07-15
DOI
10.15607/rss.2020.xvi.036
URI
https://scholarworks.unist.ac.kr/handle/201301/78404
Fulltext
http://www.roboticsproceedings.org/rss16/p036.html
Citation
Robotics: Science and Systems Conference
Abstract
In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems such as learning a controller for soft mobile robot, we also propose a Tsallis actor-critic (TAC). For a different type of RL problems, we find that a different value of the entropic index is desirable and empirically show that TAC with a proper entropic index outperforms the state-of-the-art actor-critic methods. Furthermore, to alleviate the effort for finding the proper entropic index, we propose a linear scheduling method where an entropic index linearly increases as the number of interactions increases. In simulations, the linear scheduling shows the fast convergence speed and a similar performance to TAC with the optimal entropic index, which is a useful property for real robot applications. We also apply TAC with the linear scheduling to learn a feedback controller of a soft mobile robot and shows the best performance compared to other existing actor critic methods in terms of convergence speed and the sum of rewards. Consequently, we empirically show that the proposed method efficiently learns a controller of soft mobile robots.
Publisher
Robotics: Science and Systems Foundation

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.