Policy Gradient Semi-markov Decision Process.
Ngo Anh VienTaeChoong ChungPublished in: ICTAI (2) (2008)
Keyphrases
- policy gradient
- reinforcement learning
- average reward
- gradient method
- function approximation
- markov decision processes
- reinforcement learning algorithms
- optimal control
- single agent
- approximation methods
- variance reduction
- policy search
- least squares
- long run
- partially observable markov decision processes
- multi agent