Login / Signup
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL.
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
Shixiang Shane Gu
Published in:
ICML (2021)
Keyphrases
</>
reinforcement learning
state space
real time
function approximation
multi agent
optimal policy
model free
learning algorithm
cooperative
reinforcement learning algorithms
online learning
machine learning
learning agent
action selection
policy iteration
supervised learning
batch mode
temporal difference learning