Exclusively Penalized Q-learning for Offline Reinforcement Learning

Yeom, Junghyuk; Jo, Yonghyeon; Kim, Jeongmo; Lee, Sanghyeon; Han, Seungyul

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

한승열

Han, Seungyul: Machine Learning & Intelligent Control Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	CN	-
dc.citation.conferencePlace	Vancouver, Canada	-
dc.citation.title	Neural Information Processing Systems	-
dc.contributor.author	Yeom, Junghyuk	-
dc.contributor.author	Jo, Yonghyeon	-
dc.contributor.author	Kim, Jeongmo	-
dc.contributor.author	Lee, Sanghyeon	-
dc.contributor.author	Han, Seungyul	-
dc.date.accessioned	2024-12-27T15:35:06Z	-
dc.date.available	2024-12-27T15:35:06Z	-
dc.date.created	2024-12-26	-
dc.date.issued	2024-12-13	-
dc.description.abstract	Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods.	-
dc.identifier.bibliographicCitation	Neural Information Processing Systems	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/85291	-
dc.publisher	Neural Information Processing Systems	-
dc.title	Exclusively Penalized Q-learning for Offline Reinforcement Learning	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2024-12-10	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.