Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards.
Arie HordijkAlexander YushkevichPublished in: Math. Methods Oper. Res. (1999)
Keyphrases
- stationary policies
- markov decision processes
- average cost
- state space
- markov decision chains
- finite state
- reinforcement learning
- action sets
- optimal policy
- markov decision process
- risk sensitive
- finite number
- dynamic programming
- long run
- reward function
- markov chain
- reinforcement learning algorithms
- policy iteration
- infinite horizon
- average reward
- dynamical systems
- decision processes
- optimal control
- control policy
- heuristic search
- partially observable
- action space
- markov decision problems
- planning problems
- multistage
- search algorithm
- initial state
- search space
- total cost
- belief state
- decision problems
- lot sizing
- state variables