Maxmin Q-learning: Controlling the Estimation Bias of Q-learning.
Qingfeng LanYangchen PanAlona FysheMartha WhitePublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- function approximation
- cooperative
- multi agent
- reinforcement learning algorithms
- learning algorithm
- state space
- model free
- action selection
- stochastic approximation
- optimal policy
- potential field
- multi agent reinforcement learning
- stochastic shortest path
- learning rate
- temporal difference
- temporal difference learning
- td learning
- markov decision processes
- dynamic environments
- reinforcement learning methods
- estimation algorithm
- control system
- hierarchical reinforcement learning