Maxmin Q-learning: Controlling the Estimation Bias of Q-learning.

Qingfeng Lan Yangchen Pan Alona Fyshe Martha White

Published in: ICLR (2020)

Keyphrases

reinforcement learning
function approximation
multi agent
cooperative
state space
learning algorithm
optimal policy
reinforcement learning algorithms
model free
stochastic approximation
action selection
potential field
learning rate
multi agent reinforcement learning
real time
bucket brigade
multiagent learning
markov decision processes
parameter estimation
dynamic programming
estimation accuracy
policy iteration
temporal difference learning
reinforcement learning methods
learning process
artificial neural networks
multi agent systems
relational reinforcement learning