Double Q-learning.

Hado van Hasselt

Published in: NIPS (2010)

Keyphrases

reinforcement learning
function approximation
cooperative
multi agent
model free
state space
learning algorithm
reinforcement learning algorithms
temporal difference learning
optimal policy
action selection
stochastic approximation
learning rate
stochastic shortest path
multi agent reinforcement learning
artificial neural networks
neural network
potential field
bucket brigade
data sets