Characterizing the Gap Between Actor-Critic and Policy Gradient.
Junfeng WenSaurabh KumarRamki GummadiDale SchuurmansPublished in: CoRR (2021)
Keyphrases
- policy gradient
- actor critic
- reinforcement learning
- optimal control
- gradient method
- function approximation
- policy gradient methods
- temporal difference
- variance reduction
- approximate dynamic programming
- neuro fuzzy
- approximation methods
- reinforcement learning algorithms
- partially observable markov decision processes
- average reward
- policy iteration
- markov chain
- natural actor critic
- dynamic programming