Cited time in
Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Han, Seungyul | - |
| dc.contributor.author | Yeom, Junghyuk | - |
| dc.date.accessioned | 2024-04-11T15:20:03Z | - |
| dc.date.available | 2024-04-11T15:20:03Z | - |
| dc.date.issued | 2024-02 | - |
| dc.description.abstract | In reinforcement learning, an agent receives observations from the environment, makes decisions based on these observations, and receives rewards for certain actions. By repeatedly experiencing trial and error through this process, the agent learns a policy that yields higher rewards. However, traditional reinforcement learning models heavily depend on continuous interaction with the environment for new data acquisition. This reliance often results in significant time, cost, and even safety concerns, partic- ularly in real-world applications like robotics and autonomous vehicles. To address these challenges, offline reinforcement learning (Offline RL) emerges as a viable alternative. By utilizing pre-collected data, Offline RL greatly diminishing the need for continuous data collection and thereby reducing time and safety risks. This approach provides a safer and more controlled learning environment, as it depends on data that is already available. However, Offline RL encounters specific issues due to its reliance on a fixed dataset. A notable chal- lenge emerges when the pre-existing data greatly differs from the target policy to be learned, leading to out-of-distribution problems that present significant obstacles in effectively implementing offline rein- forcement learning. To address this problem, it is crucial for offline RL algorithms to be designed in a conservative way that can make the learned policy close with the behavior policy. Our baseline method, Conservative Q-Learning [1] addresses this challenge by applying penalties to state-action pairs gener- ated by the policy. This approach enables the learning of a conservative Q-function that serves as a lower bound for the true value function. Yet, this method might lead to performance degradation from exces- sive constraints. Moreover, if the imposed penalties do not accurately reflect the dataset’s characteristics, the algorithm’s performance may become too dependent on the quality of the batch data. In this paper, we propose a penalty relaxation technique by analyzing the penalty characteristics sug- gested in conservative Q-learning. By adjusting penalties to align with the batch data’s characteristics, we can reduce reliance on the dataset for performance, thereby boosting the efficiency of Offline RL. This increased efficiency is achievable with a smaller number of networks, ensuring greater effective- ness. Furthermore, to tackle the suboptimal aspects of batch data, we propose an strategy for updating the behavior policy, advancing beyond simple replication of the current policy. Our method focuses on learning the behavior policy in a way that aligns with the quality of the batch data. This technique in- volves the categorize states and predict a more optimized policy. By incorporating it in Bellman updates and conservative Q-learning, we can enhance the performance of the behavior policy and mitigate the bias issues inherent in the dataset. | - |
| dc.description.degree | Master | - |
| dc.description | Graduate School of Artificial Intelligence | - |
| dc.identifier.uri | https://scholarworks.unist.ac.kr/handle/201301/82157 | - |
| dc.identifier.uri | http://unist.dcollection.net/common/orgView/200000744587 | - |
| dc.language | ENG | - |
| dc.publisher | Ulsan National Institute of Science and Technology | - |
| dc.rights.embargoReleaseDate | 9999-12-31 | - |
| dc.rights.embargoReleaseTerms | 9999-12-31 | - |
| dc.subject | 오프라인 강화학습 | - |
| dc.subject | 패널티 | - |
| dc.subject | 보수적 Q-learning | - |
| dc.title.alternative | 효율적 오프라인 강화학습을 위한 패널티 감쇄 기법 | - |
| dc.title | Penalty relaxation techniques for efficient offline reinforcement learning | - |
| dc.type | Thesis | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr
Copyright (c) 2023 by UNIST LIBRARY. All rights reserved.
ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.