Offline RL Policies Should be Trained to be Adaptive.

Dibya Ghosh Anurag Ajay Pulkit Agrawal Sergey Levine

Published in: CoRR (2022)

Keyphrases

optimal policy
reinforcement learning
markov decision processes
state space
control policies
control policy
markov decision process
adaptive control
dynamic programming
training set
artificial neural networks
temporal difference
reinforcement learning algorithms
multi agent
neural network
learning agents
real time