Stabilizing Q Learning Via Soft Mellowmax Operator.

Yaozhong Gan Zhe Zhang Xiaoyang Tan

Published in: CoRR (2020)

Keyphrases

reinforcement learning
cooperative
function approximation
state space
multi agent
learning algorithm
optimal policy
reinforcement learning algorithms
learning rate
potential field
real time
case study
artificial neural networks
hierarchical reinforcement learning
bucket brigade