There are no files associated with this item.
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.citation.endPage | 2056 | - |
dc.citation.number | 5 | - |
dc.citation.startPage | 2045 | - |
dc.citation.title | IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | - |
dc.citation.volume | 33 | - |
dc.contributor.author | Ladosz, Pawel | - |
dc.contributor.author | Ben-Iwhiwhu, Eseoghene | - |
dc.contributor.author | Dick, Jeffery | - |
dc.contributor.author | Ketz, Nicholas | - |
dc.contributor.author | Kolouri, Soheil | - |
dc.contributor.author | Krichmar, Jeffrey L. | - |
dc.contributor.author | Pilly, Praveen K. | - |
dc.contributor.author | Soltoggio, Andrea | - |
dc.date.accessioned | 2023-12-21T14:10:09Z | - |
dc.date.available | 2023-12-21T14:10:09Z | - |
dc.date.created | 2023-03-06 | - |
dc.date.issued | 2022-05 | - |
dc.description.abstract | In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards. | - |
dc.identifier.bibliographicCitation | IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, v.33, no.5, pp.2045 - 2056 | - |
dc.identifier.doi | 10.1109/TNNLS.2021.3110281 | - |
dc.identifier.issn | 2162-237X | - |
dc.identifier.scopusid | 2-s2.0-85115699805 | - |
dc.identifier.uri | https://scholarworks.unist.ac.kr/handle/201301/62206 | - |
dc.identifier.wosid | 000732242800001 | - |
dc.language | 영어 | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture | - |
dc.type | Article | - |
dc.description.isOpenAccess | FALSE | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence; Computer Science, Hardware & Architecture; Computer Science, Theory & Methods; Engineering, Electrical & Electronic | - |
dc.relation.journalResearchArea | Computer Science; Engineering | - |
dc.type.docType | Article | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordAuthor | Reinforcement learning | - |
dc.subject.keywordAuthor | History | - |
dc.subject.keywordAuthor | Markov processes | - |
dc.subject.keywordAuthor | Benchmark testing | - |
dc.subject.keywordAuthor | Delays | - |
dc.subject.keywordAuthor | Decision making | - |
dc.subject.keywordAuthor | Correlation | - |
dc.subject.keywordAuthor | Biologically inspired learning | - |
dc.subject.keywordAuthor | decision-making | - |
dc.subject.keywordAuthor | deep reinforcement learning (RL) | - |
dc.subject.keywordAuthor | partially observable Markov decision process (POMDP) | - |
dc.subject.keywordPlus | DISTAL REWARD PROBLEM | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr
Copyright (c) 2023 by UNIST LIBRARY. All rights reserved.
ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.