An Off-Policy Natural Policy Gradient Method for a Partial Observable Markov Decision Process.
Yutaka NakamuraTakeshi MoriShin IshiiPublished in: ICANN (2) (2005)
Keyphrases
- markov decision process
- gradient method
- optimal policy
- state space
- markov decision processes
- reinforcement learning
- convergence rate
- infinite horizon
- finite horizon
- step size
- initial state
- policy iteration
- optimization methods
- negative matrix factorization
- transition probabilities
- decision problems
- average cost
- state action
- reward function
- dynamic programming
- search algorithm
- partially observable
- long run
- action space