Login / Signup
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration.
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
Published in:
CoRR (2022)
Keyphrases
</>
policy gradient
dynamic programming
optimal control
estimation error
learning algorithm
reinforcement learning
search space