Strong 0-discount optimal policies in a Markov decision process with a Borel state space.
A. A. YushkevichPublished in: Math. Methods Oper. Res. (1995)
Keyphrases
- optimal policy
- state space
- markov decision processes
- reinforcement learning
- decision problems
- finite horizon
- finite state
- heuristic search
- markov decision process
- reward function
- infinite horizon
- dynamic programming
- state dependent
- multistage
- average reward
- stationary policies
- policy iteration
- long run
- reinforcement learning algorithms
- partially observable
- markov chain
- markov decision problems
- action space
- initial state
- control policies
- particle filter
- serial inventory systems
- average reward reinforcement learning
- dynamical systems
- planning problems
- state variables
- semi markov decision processes
- partially observable markov decision processes
- probability distribution
- dynamic programming algorithms
- average cost