Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture

Ladosz, Pawel; Ben-Iwhiwhu, Eseoghene; Dick, Jeffery; Ketz, Nicholas; Kolouri, Soheil; Krichmar, Jeffrey L.; Pilly, Praveen K.; Soltoggio, Andrea

doi:10.1109/TNNLS.2021.3110281

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.endPage	2056	-
dc.citation.number	5	-
dc.citation.startPage	2045	-
dc.citation.title	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS	-
dc.citation.volume	33	-
dc.contributor.author	Ladosz, Pawel	-
dc.contributor.author	Ben-Iwhiwhu, Eseoghene	-
dc.contributor.author	Dick, Jeffery	-
dc.contributor.author	Ketz, Nicholas	-
dc.contributor.author	Kolouri, Soheil	-
dc.contributor.author	Krichmar, Jeffrey L.	-
dc.contributor.author	Pilly, Praveen K.	-
dc.contributor.author	Soltoggio, Andrea	-
dc.date.accessioned	2023-12-21T14:10:09Z	-
dc.date.available	2023-12-21T14:10:09Z	-
dc.date.created	2023-03-06	-
dc.date.issued	2022-05	-
dc.description.abstract	In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.	-
dc.identifier.bibliographicCitation	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, v.33, no.5, pp.2045 - 2056	-
dc.identifier.doi	10.1109/TNNLS.2021.3110281	-
dc.identifier.issn	2162-237X	-
dc.identifier.scopusid	2-s2.0-85115699805	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/62206	-
dc.identifier.wosid	000732242800001	-
dc.language	영어	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence; Computer Science, Hardware & Architecture; Computer Science, Theory & Methods; Engineering, Electrical & Electronic	-
dc.relation.journalResearchArea	Computer Science; Engineering	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Reinforcement learning	-
dc.subject.keywordAuthor	History	-
dc.subject.keywordAuthor	Markov processes	-
dc.subject.keywordAuthor	Benchmark testing	-
dc.subject.keywordAuthor	Delays	-
dc.subject.keywordAuthor	Decision making	-
dc.subject.keywordAuthor	Correlation	-
dc.subject.keywordAuthor	Biologically inspired learning	-
dc.subject.keywordAuthor	decision-making	-
dc.subject.keywordAuthor	deep reinforcement learning (RL)	-
dc.subject.keywordAuthor	partially observable Markov decision process (POMDP)	-
dc.subject.keywordPlus	DISTAL REWARD PROBLEM	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.