Mixing-Time Regularized Policy Gradient.
Tetsuro MorimuraTakayuki OsogamiTomoyuki ShiraiPublished in: AAAI (2014)
Keyphrases
- policy gradient
- actor critic
- parametric optimization
- optimal control
- reinforcement learning
- reinforcement learning algorithms
- gradient method
- function approximation
- variance reduction
- model free reinforcement learning
- least squares
- approximation methods
- partially observable markov decision processes
- single agent
- markov decision processes
- objective function
- neural network