Login / Signup
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline.
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
Published in:
CoRR (2024)
Keyphrases
</>
gradient method
policy gradient
actor critic
convergence rate
dynamic programming
optimal solution
optimization methods
negative matrix factorization
neural network
optimal policy
optimal control
action selection
similarity measure
multiresolution
step size
convex formulation