Login / Signup
Offline Reinforcement Learning with Realizability and Single-policy Concentrability.
Wenhao Zhan
Baihe Huang
Audrey Huang
Nan Jiang
Jason D. Lee
Published in:
CoRR (2022)
Keyphrases
</>
reinforcement learning
optimal policy
policy search
action selection
neural network
machine learning
real time
markov decision processes
partially observable environments
dynamic programming
temporal difference
partially observable
control policy
policy gradient