Stabilizing Q Learning Via Soft Mellowmax Operator.
Yaozhong GanZhe ZhangXiaoyang TanPublished in: AAAI (2021)
Keyphrases
- reinforcement learning
- function approximation
- learning algorithm
- multi agent
- cooperative
- state space
- stochastic approximation
- learning rate
- optimal policy
- nonlinear systems
- dynamic programming
- neural network
- action selection
- aggregation operators
- temporal difference learning
- relational databases
- least squares
- information systems
- genetic algorithm
- reinforcement learning algorithms
- database