Perceptive Evaluation for the Optimal Discounted Reward in Markov Decision Processes.
Masami KuranoMasami YasudaJun-ichi NakagamiYuji YoshidaPublished in: MDAI (2005)
Keyphrases
- discounted reward
- markov decision processes
- average reward
- policy iteration
- dynamic programming
- optimal policy
- state space
- average cost
- finite state
- reinforcement learning
- transition matrices
- planning under uncertainty
- state and action spaces
- long run
- hierarchical reinforcement learning
- partially observable
- optimality criterion
- reinforcement learning algorithms
- action space
- markov chain
- finite horizon
- decision theoretic planning
- model free
- decision processes
- infinite horizon
- least squares
- reward function