C
search
search
reviewers
reviewers
feeds
feeds
assignments
assignments
settings
logout
Optimal Estimation of Policy Gradient via Double Fitted Iteration.
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
Published in:
ICML (2022)
Keyphrases
</>
policy gradient
actor critic
optimal control
optimal solution
upper bound
markov decision processes
function approximation
estimation error