Topological Value Iteration Algorithm for Markov Decision Processes.
Peng DaiJudy GoldsmithPublished in: IJCAI (2007)
Keyphrases
- markov decision processes
- dynamic programming
- average reward
- policy iteration
- state space
- model based reinforcement learning
- optimal policy
- finite state
- optimal solution
- reinforcement learning
- total reward
- search space
- computational complexity
- learning algorithm
- heuristic search
- monte carlo
- np hard
- partially observable
- continuous state spaces
- discount factor
- infinite horizon
- transition matrices
- real time dynamic programming