Bellman-consistent Pessimism for Offline Reinforcement Learning.

Tengyang Xie Ching-An Cheng Nan Jiang Paul Mineiro Alekh Agarwal

Published in: NeurIPS (2021)

Keyphrases

reinforcement learning
function approximation
actor critic
multi agent
temporal difference learning
reinforcement learning algorithms
model free
state action
state space
learning algorithm
optimal policy
markov decision processes
dynamic programming
linear program
piecewise linear
stochastic approximation
action space
real time
robotic control
temporal difference
action selection
learning classifier systems
globally optimal
supervised learning
hidden markov models
website
neural network