Login / Signup
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality.
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
Published in:
CoRR (2021)
Keyphrases
</>
convergence proof
actor critic
reinforcement learning
optimal solution
function approximation
convergence speed
gradient method