Bellman-consistent Pessimism for Offline Reinforcement Learning.

Tengyang Xie Ching-An Cheng Nan Jiang Paul Mineiro Alekh Agarwal

Published in: CoRR (2021)

Keyphrases

reinforcement learning
function approximation
real time
machine learning
state action
temporal difference learning
state space
globally optimal
learning algorithm
multi agent
reinforcement learning algorithms
linear program
model free
piecewise linear
least squares
consistency constraints
markov decision processes
dynamic programming
control system
information retrieval
neural network