A structured pattern matrix algorithm for multichain Markov decision processes.
Tetsuichiro IkiMasayuki HoriguchiMasami KuranoPublished in: Math. Methods Oper. Res. (2007)
Keyphrases
- markov decision processes
- dynamic programming
- average reward
- model based reinforcement learning
- policy iteration
- learning algorithm
- search space
- reinforcement learning
- path finding
- monte carlo
- optimal policy
- computational complexity
- state space
- objective function
- real time dynamic programming
- optimal solution
- expected reward