Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning

Lee, Hyun Kyu; Yoon, Sung Whan

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

윤성환

Yoon, Sung Whan: Machine Intelligence and Information Learning Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	SI	-
dc.citation.conferencePlace	Singapore EXPO	-
dc.citation.title	International Conference on Learning Representations	-
dc.contributor.author	Lee, Hyun Kyu	-
dc.contributor.author	Yoon, Sung Whan	-
dc.date.accessioned	2025-02-13T12:05:06Z	-
dc.date.available	2025-02-13T12:05:06Z	-
dc.date.created	2025-02-12	-
dc.date.issued	2025-04-24	-
dc.description.abstract	Investigating flat minima on loss surfaces in parameter space is well-documented in the supervised learning context, highlighting its advantages for model generalization. However, limited attention has been paid to the reinforcement learning (RL) context, where the impact of flatter reward landscapes in policy parameter space remains largely unexplored. Beyond merely extrapolating from supervised learning, which suggests a link between flat reward landscapes and enhanced generalization, we aim to formally connect the flatness of the reward surface to the robustness of RL models. In policy models where a deep neural network determines actions, flatter reward landscapes in response to parameter perturbations lead to consistent rewards even when actions are perturbed. Moreover, robustness to actions further contributes to robustness against other variations, such as changes in state transition probabilities and reward functions. We extensively simulate various RL environments, confirming the consistent benefits of flatter reward landscapes in enhancing the robustness of RL under diverse conditions, including action selection, transition dynamics, and reward functions.	-
dc.identifier.bibliographicCitation	International Conference on Learning Representations	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86223	-
dc.identifier.url	https://openreview.net/forum?id=4OaO3GjP7k	-
dc.language	영어	-
dc.publisher	International Conference on Learning Representations	-
dc.title	Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2025-04-24	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.