Learning Optimal Policies in Markov Decision Processes with Value Function Discovery?
Martijn OnderwaterSandjai BhulaiRob van der MeiPublished in: SIGMETRICS Perform. Evaluation Rev. (2015)
Keyphrases
- markov decision processes
- optimal policy
- reinforcement learning
- partially observable
- finite state
- state space
- finite horizon
- average reward
- policy iteration
- average cost
- infinite horizon
- real time dynamic programming
- average reward reinforcement learning
- transition matrices
- long run
- discount factor
- decision theoretic planning
- dynamic programming
- state abstraction
- learning algorithm
- decision processes
- markov decision process
- reward function
- multistage
- sufficient conditions
- policy evaluation
- state and action spaces
- discounted reward
- decision problems
- reinforcement learning algorithms
- action space
- decision theoretic