Deep Reinforcement Learning by Parallelizing Reward and Punishment using the MaxPain Architecture.

Jiexin Wang Stefan Elfwing Eiji Uchibe

Published in: ICDL-EPIROB (2018)

Keyphrases

reinforcement learning
agent receives
state space
learning capabilities
function approximation
parallel processing
eligibility traces
reinforcement learning algorithms
learning algorithm
multi agent
reward function
software architecture
management system
machine learning
learning agent
model free
hardware implementation
expert systems
temporal difference
real time
learning problems
social networks
partially observable
partially observable markov decision processes
decision problems
inverse reinforcement learning
sufficient conditions
partially observable environments
dynamic programming