A New Natural Policy Gradient by Stationary Distribution Metric.
Tetsuro MorimuraEiji UchibeJunichiro YoshimotoKenji DoyaPublished in: ECML/PKDD (2) (2008)
Keyphrases
- stationary distribution
- policy gradient
- markov chain
- random walk
- queueing networks
- queue length
- sufficient conditions
- transition probabilities
- function approximation
- initial state
- reinforcement learning
- average reward
- approximation methods
- special case
- finite state
- optimal control
- state space
- artificial neural networks
- reinforcement learning algorithms
- heuristic search
- higher order
- variance reduction