Double Gumbel Q-Learning.

David Yu-Tung Hui Aaron C. Courville Pierre-Luc Bacon

Published in: NeurIPS (2023)

Keyphrases

reinforcement learning
cooperative
multi agent
function approximation
learning algorithm
state space
reinforcement learning algorithms
multi agent reinforcement learning
action selection
optimal policy
temporal difference learning
stochastic approximation
model free
information retrieval
learning rate
databases
policy iteration
database