File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture

Author(s)
Ladosz, PawelBen-Iwhiwhu, EseogheneDick, JefferyKetz, NicholasKolouri, SoheilKrichmar, Jeffrey L.Pilly, Praveen K.Soltoggio, Andrea
Issued Date
2022-05
DOI
10.1109/TNNLS.2021.3110281
URI
https://scholarworks.unist.ac.kr/handle/201301/62206
Citation
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, v.33, no.5, pp.2045 - 2056
Abstract
In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
ISSN
2162-237X
Keyword (Author)
Reinforcement learningHistoryMarkov processesBenchmark testingDelaysDecision makingCorrelationBiologically inspired learningdecision-makingdeep reinforcement learning (RL)partially observable Markov decision process (POMDP)
Keyword
DISTAL REWARD PROBLEM

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.