Time aggregated Markov decision processes via standard dynamic programming.
Edilson F. ArrudaMarcelo D. FragosoPublished in: Oper. Res. Lett. (2011)
Keyphrases
- markov decision processes
- dynamic programming
- state space
- optimal policy
- reinforcement learning
- finite state
- decision theoretic planning
- partially observable
- risk sensitive
- finite horizon
- model based reinforcement learning
- factored mdps
- infinite horizon
- reinforcement learning algorithms
- transition matrices
- average reward
- decision processes
- action sets
- average cost
- planning under uncertainty
- optimal control
- multistage
- policy iteration
- linear program
- linear programming
- reward function
- real time dynamic programming
- stochastic shortest path