Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation.
Shangtong ZhangBo LiuHengshuai YaoShimon WhitesonPublished in: ICML (2020)
Keyphrases
- function approximation
- provably convergent
- actor critic
- temporal difference
- reinforcement learning
- policy gradient
- shape from shading
- reinforcement learning algorithms
- temporal difference learning
- radial basis function
- learning tasks
- function approximators
- model free
- optimal control
- natural actor critic
- multi agent
- training data
- neuro fuzzy
- markov decision processes
- search space
- approximate dynamic programming
- feature extraction