Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace.
Yuhu ChengHuanting FengXuesong WangPublished in: ICIC (2) (2011)
Keyphrases
- actor critic
- temporal difference
- least squares
- monte carlo
- td learning
- cost function
- dynamic programming
- function approximation
- reinforcement learning
- learning algorithm
- linear programming
- optimal solution
- approximate dynamic programming
- optimization algorithm
- evaluation function
- policy gradient
- policy evaluation
- objective function