Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards.
Rommert DekkerArie HordijkPublished in: Math. Oper. Res. (1988)
Keyphrases
- markov decision chains
- average cost
- markov decision processes
- optimal policy
- reinforcement learning
- finite state
- finite horizon
- long run
- control policy
- infinite horizon
- risk sensitive
- reward function
- dynamic programming
- state space
- decision problems
- policy iteration
- multistage
- sufficient conditions
- initial state
- markov decision process
- total reward
- finite number
- average reward
- discounted reward
- partially observable
- action space
- expected reward
- linear program
- optimal control
- markov decision problems
- total cost
- average reward reinforcement learning
- linear programming
- stationary policies
- control policies
- lost sales
- reinforcement learning algorithms