A Temporal-Difference Approach to Policy Gradient Estimation.
Samuele TosattoAndrew PattersonMartha WhiteRupam MahmoodPublished in: ICML (2022)
Keyphrases
- gradient estimation
- temporal difference
- actor critic
- policy evaluation
- variance reduction
- action selection
- td learning
- monte carlo
- reinforcement learning
- function approximation
- policy iteration
- evaluation function
- reinforcement learning algorithms
- model free
- step size
- policy gradient
- function approximators
- temporal difference learning
- reinforcement learning problems
- approximate dynamic programming
- supervised learning
- importance sampling
- optimal policy
- optimal control
- policy search
- artificial neural networks
- convergence speed
- model selection
- markov chain