Provably Convergent Off-Policy Actor-Critic with Function Approximation.
Shangtong ZhangBo LiuHengshuai YaoShimon WhitesonPublished in: CoRR (2019)
Keyphrases
- function approximation
- provably convergent
- actor critic
- temporal difference
- reinforcement learning
- policy gradient
- shape from shading
- reinforcement learning algorithms
- learning tasks
- model free
- temporal difference learning
- radial basis function
- function approximators
- markov decision processes
- natural actor critic
- approximate dynamic programming
- policy iteration
- gradient method
- neuro fuzzy
- real valued
- feature space
- multi agent
- feature extraction
- machine learning