Conservative Offline Distributional Reinforcement Learning.

Yecheng Jason Ma Dinesh Jayaraman Osbert Bastani

Published in: NeurIPS (2021)

Keyphrases

reinforcement learning
function approximation
learning algorithm
co occurrence
state space
real time
reinforcement learning algorithms
temporal difference
model free
dynamic programming
markov decision processes
optimal policy
policy search
information systems
artificial intelligence
machine learning
real world
action space
direct policy search
evolutionary learning
continuous state
stochastic approximation
temporal difference learning
multi agent systems
multi agent
neural network