Login / Signup
When is Offline Policy Selection Sample Efficient for Reinforcement Learning?
Vincent Liu
Prabhat Nagarajan
Andrew Patterson
Martha White
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
optimal policy
dynamic programming
model free
sample size
partially observable
data sets
genetic algorithm
learning process
state space
least squares
transfer learning
reward function
sample points
policy search