Login / Signup
Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback.
Ishaan Shah
David Halpern
Kavosh Asadi
Michael L. Littman
Published in:
CoRR (2021)
Keyphrases
</>
policy gradient
convergence rate
natural gradient
optimal solution
gradient ascent
simulated annealing
learning algorithm
np hard
gradient method
actor critic
dynamic programming
path planning
policy search
average reward
evolutionary algorithm
optimization method
search space
computational complexity
multi agent