Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards.
Arie HordijkAlexander YushkevichPublished in: Math. Methods Oper. Res. (1999)
Keyphrases
- markov decision processes
- average cost
- markov decision chains
- state space
- optimal policy
- stationary policies
- finite state
- reinforcement learning
- markov decision process
- risk sensitive
- reward function
- finite horizon
- markov decision problems
- heuristic search
- dynamic programming
- reinforcement learning algorithms
- infinite horizon
- action sets
- policy iteration
- average reward
- partially observable
- control policy
- decision processes
- macro actions
- action space
- finite number
- decision problems
- initial state
- markov chain
- expected reward
- linear programming
- optimality criterion
- partially observable markov decision processes
- search space
- state variables
- search algorithm
- control policies
- sufficient conditions
- dynamical systems
- planning problems
- function approximation
- long run