Login / Signup
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage.
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
Published in:
NeurIPS (2023)
Keyphrases
</>
reinforcement learning
multi agent
learning algorithm
state space
cooperative
function approximation
optimal policy
real time
stochastic approximation
temporal difference learning
dynamic programming
model free
learning rate
worst case
policy iteration
evaluation function
monte carlo
lower bound
data sets