A two-phase time aggregation algorithm for average cost Markov decision processes.
Edilson Fernandes de ArrudaMarcelo D. FragosoPublished in: ACC (2012)
Keyphrases
- markov decision processes
- average cost
- dynamic programming
- model based reinforcement learning
- finite state
- policy iteration
- average reward
- optimal policy
- objective function
- learning algorithm
- mathematical model
- action sets
- probabilistic model
- np hard
- optimal solution
- reinforcement learning
- linear programming
- state space
- finite number
- infinite horizon
- transition matrices
- search space
- computational complexity
- state abstraction
- real time dynamic programming
- discount factor
- approximate dynamic programming
- partially observable
- optimal control