A Temporal-Difference Approach to Policy Gradient Estimation.
Samuele TosattoAndrew PattersonMartha WhiteA. Rupam MahmoodPublished in: CoRR (2022)
Keyphrases
- gradient estimation
- temporal difference
- actor critic
- policy evaluation
- variance reduction
- action selection
- reinforcement learning
- td learning
- function approximation
- monte carlo
- policy iteration
- evaluation function
- reinforcement learning algorithms
- temporal difference learning
- function approximators
- reinforcement learning problems
- step size
- policy gradient
- model free
- sample size
- policy search
- approximate dynamic programming
- optimal policy
- optimal control
- neuro fuzzy
- supervised learning
- text categorization
- markov decision processes
- gradient method
- partially observable markov decision processes
- least squares
- machine learning