Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem.
Eric V. DenardoPublished in: Oper. Res. (1970)
Keyphrases
- search algorithm
- markov decision problems
- optimal policy
- finite state
- state space
- markov decision processes
- infinite horizon
- reinforcement learning
- policy iteration
- decision problems
- linear programming
- dynamic programming
- finite horizon
- partially observable
- multistage
- long run
- state dependent
- average cost
- markov chain
- sufficient conditions
- decision theoretic
- average reward
- decision processes
- markov decision process
- transition probabilities
- partially observable markov decision processes
- reward function
- expected utility
- queueing networks
- random variables