Maxmin Q-learning: Controlling the Estimation Bias of Q-learning.

Qingfeng Lan Yangchen Pan Alona Fyshe Martha White

Published in: CoRR (2020)

Keyphrases

reinforcement learning
function approximation
cooperative
multi agent
reinforcement learning algorithms
learning algorithm
state space
model free
action selection
stochastic approximation
optimal policy
potential field
multi agent reinforcement learning
stochastic shortest path
learning rate
temporal difference
temporal difference learning
td learning
markov decision processes
dynamic environments
reinforcement learning methods
estimation algorithm
control system
hierarchical reinforcement learning