An Off-policy Policy Gradient Theorem Using Emphatic Weightings.
Ehsan ImaniEric GravesMartha WhitePublished in: CoRR (2018)
Keyphrases
- policy gradient
- parametric optimization
- reinforcement learning
- actor critic
- gradient method
- function approximation
- model free reinforcement learning
- optimal control
- approximation methods
- reinforcement learning algorithms
- partially observable markov decision processes
- variance reduction
- average reward
- machine learning
- state space
- state action