Stabilizing Q Learning Via Soft Mellowmax Operator.

Yaozhong Gan Zhe Zhang Xiaoyang Tan

Published in: AAAI (2021)

Keyphrases

reinforcement learning
function approximation
learning algorithm
multi agent
cooperative
state space
stochastic approximation
learning rate
optimal policy
nonlinear systems
dynamic programming
neural network
action selection
aggregation operators
temporal difference learning
relational databases
least squares
information systems
genetic algorithm
reinforcement learning algorithms
database