Deep Reinforcement Learning by Parallelizing Reward and Punishment using the MaxPain Architecture.
Jiexin WangStefan ElfwingEiji UchibePublished in: ICDL-EPIROB (2018)
Keyphrases
- reinforcement learning
- agent receives
- state space
- learning capabilities
- function approximation
- parallel processing
- eligibility traces
- reinforcement learning algorithms
- learning algorithm
- multi agent
- reward function
- software architecture
- management system
- machine learning
- learning agent
- model free
- hardware implementation
- expert systems
- temporal difference
- real time
- learning problems
- social networks
- partially observable
- partially observable markov decision processes
- decision problems
- inverse reinforcement learning
- sufficient conditions
- partially observable environments
- dynamic programming