SVRG for Policy Evaluation with Fewer Gradient Evaluations.
Zilun PengAhmed TouatiPascal VincentDoina PrecupPublished in: CoRR (2019)
Keyphrases
- policy evaluation
- least squares
- reinforcement learning
- temporal difference
- monte carlo
- model free
- variance reduction
- policy gradient
- policy iteration
- markov decision processes
- function approximation
- matrix inversion
- optimal policy
- semi parametric
- reinforcement learning algorithms
- partially observable markov decision processes
- supervised learning
- statistical inference
- artificial neural networks
- gradient method
- optimal solution
- training data
- learning algorithm
- machine learning