Stabilizing Extreme Q-learning by Maclaurin Expansion.
Motoki OmuraTakayuki OsaYusuke MukutaTatsuya HaradaPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- cooperative
- function approximation
- learning algorithm
- multi agent
- state space
- learning rate
- stochastic approximation
- optimal policy
- model free
- temporal difference learning
- reinforcement learning algorithms
- action selection
- bucket brigade
- database
- multi agent reinforcement learning
- dynamic environments
- least squares
- dynamic programming
- expert systems
- case study