File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.endPage 2056 -
dc.citation.number 5 -
dc.citation.startPage 2045 -
dc.citation.title IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS -
dc.citation.volume 33 -
dc.contributor.author Ladosz, Pawel -
dc.contributor.author Ben-Iwhiwhu, Eseoghene -
dc.contributor.author Dick, Jeffery -
dc.contributor.author Ketz, Nicholas -
dc.contributor.author Kolouri, Soheil -
dc.contributor.author Krichmar, Jeffrey L. -
dc.contributor.author Pilly, Praveen K. -
dc.contributor.author Soltoggio, Andrea -
dc.date.accessioned 2023-12-21T14:10:09Z -
dc.date.available 2023-12-21T14:10:09Z -
dc.date.created 2023-03-06 -
dc.date.issued 2022-05 -
dc.description.abstract In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards. -
dc.identifier.bibliographicCitation IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, v.33, no.5, pp.2045 - 2056 -
dc.identifier.doi 10.1109/TNNLS.2021.3110281 -
dc.identifier.issn 2162-237X -
dc.identifier.scopusid 2-s2.0-85115699805 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/62206 -
dc.identifier.wosid 000732242800001 -
dc.language 영어 -
dc.publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC -
dc.title Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture -
dc.type Article -
dc.description.isOpenAccess FALSE -
dc.relation.journalWebOfScienceCategory Computer Science, Artificial Intelligence; Computer Science, Hardware & Architecture; Computer Science, Theory & Methods; Engineering, Electrical & Electronic -
dc.relation.journalResearchArea Computer Science; Engineering -
dc.type.docType Article -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor Reinforcement learning -
dc.subject.keywordAuthor History -
dc.subject.keywordAuthor Markov processes -
dc.subject.keywordAuthor Benchmark testing -
dc.subject.keywordAuthor Delays -
dc.subject.keywordAuthor Decision making -
dc.subject.keywordAuthor Correlation -
dc.subject.keywordAuthor Biologically inspired learning -
dc.subject.keywordAuthor decision-making -
dc.subject.keywordAuthor deep reinforcement learning (RL) -
dc.subject.keywordAuthor partially observable Markov decision process (POMDP) -
dc.subject.keywordPlus DISTAL REWARD PROBLEM -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.