Toward an optimized value iteration algorithm for average cost Markov decision processes.
Edilson F. ArrudaFabrício OuriqueAnthony AlmudevarPublished in: CDC (2010)
Keyphrases
- markov decision processes
- average cost
- dynamic programming
- policy iteration
- model based reinforcement learning
- average reward
- optimal policy
- state space
- finite state
- infinite horizon
- real time dynamic programming
- finite horizon
- reinforcement learning
- long run
- optimal solution
- convergence rate
- transition matrices
- linear programming
- control policy
- discount factor
- decision theoretic planning
- model free
- search space
- risk sensitive
- factored mdps
- decision making
- action sets
- approximate dynamic programming
- planning under uncertainty
- markov decision process
- multistage
- partially observable