Login / Signup
Optimal Estimation of Policy Gradient via Double Fitted Iteration.
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
Published in:
ICML (2022)
Keyphrases
</>
policy gradient
actor critic
optimal control
optimal solution
upper bound
markov decision processes
function approximation
estimation error