A further anticycling rule in multichain policy iteration for undiscounted Markov renewal programs.
Dieter SpreenPublished in: Z. Oper. Research (1981)
Keyphrases
- policy iteration
- markov decision processes
- average reward
- markov chain
- finite state
- optimal policy
- state space
- sample path
- model free
- reinforcement learning
- infinite horizon
- stochastic games
- policy evaluation
- fixed point
- steady state
- markov decision problems
- dynamic programming
- markov model
- least squares
- markov decision process
- transition probabilities
- decision processes
- long run
- partially observable
- average cost
- decision problems
- monte carlo
- random walk
- temporal difference
- reinforcement learning algorithms
- model checking
- linear programming