Maxmin Q-learning: Controlling the Estimation Bias of Q-learning.
Qingfeng LanYangchen PanAlona FysheMartha WhitePublished in: ICLR (2020)
Keyphrases
- reinforcement learning
- function approximation
- multi agent
- cooperative
- state space
- learning algorithm
- optimal policy
- reinforcement learning algorithms
- model free
- stochastic approximation
- action selection
- potential field
- learning rate
- multi agent reinforcement learning
- real time
- bucket brigade
- multiagent learning
- markov decision processes
- parameter estimation
- dynamic programming
- estimation accuracy
- policy iteration
- temporal difference learning
- reinforcement learning methods
- learning process
- artificial neural networks
- multi agent systems
- relational reinforcement learning