Double Gumbel Q-Learning.
David Yu-Tung HuiAaron C. CourvillePierre-Luc BaconPublished in: NeurIPS (2023)
Keyphrases
- reinforcement learning
- cooperative
- multi agent
- function approximation
- learning algorithm
- state space
- reinforcement learning algorithms
- multi agent reinforcement learning
- action selection
- optimal policy
- temporal difference learning
- stochastic approximation
- model free
- information retrieval
- learning rate
- databases
- policy iteration
- database