File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

윤성환

Yoon, Sung Whan
Machine Intelligence and Information Learning Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning

Author(s)
Lee, Hyun KyuYoon, Sung Whan
Issued Date
2025-04-24
URI
https://scholarworks.unist.ac.kr/handle/201301/86223
Fulltext
https://openreview.net/forum?id=4OaO3GjP7k
Citation
International Conference on Learning Representations
Abstract
Investigating flat minima on loss surfaces in parameter space is well-documented in the supervised learning context, highlighting its advantages for model generalization. However, limited attention has been paid to the reinforcement learning (RL) context, where the impact of flatter reward landscapes in policy parameter space remains largely unexplored. Beyond merely extrapolating from supervised learning, which suggests a link between flat reward landscapes and enhanced generalization, we aim to formally connect the flatness of the reward surface to the robustness of RL models. In policy models where a deep neural network determines actions, flatter reward landscapes in response to parameter perturbations lead to consistent rewards even when actions are perturbed. Moreover, robustness to actions further contributes to robustness against other variations, such as changes in state transition probabilities and reward functions. We extensively simulate various RL environments, confirming the consistent benefits of flatter reward landscapes in enhancing the robustness of RL under diverse conditions, including action selection, transition dynamics, and reward functions.
Publisher
International Conference on Learning Representations

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.