Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage.

Masatoshi Uehara Nathan Kallus Jason D. Lee Wen Sun

Published in: NeurIPS (2023)

Keyphrases

reinforcement learning
multi agent
learning algorithm
state space
cooperative
function approximation
optimal policy
real time
stochastic approximation
temporal difference learning
dynamic programming
model free
learning rate
worst case
policy iteration
evaluation function
monte carlo
lower bound
data sets