Characterizing the Gap Between Actor-Critic and Policy Gradient.
Junfeng WenSaurabh KumarRamki GummadiDale SchuurmansPublished in: ICML (2021)
Keyphrases
- policy gradient
- actor critic
- reinforcement learning
- optimal control
- gradient method
- function approximation
- neuro fuzzy
- reinforcement learning algorithms
- approximate dynamic programming
- policy gradient methods
- approximation methods
- temporal difference
- average reward
- policy iteration
- markov decision processes
- partially observable markov decision processes
- variance reduction
- natural actor critic
- single agent
- neural network