A policy iteration heuristic for constrained discounted controlled Markov Chains.
Hyeong Soo ChangPublished in: Optim. Lett. (2012)
Keyphrases
- markov chain
- policy iteration
- markov decision processes
- sample path
- average reward
- finite state
- state space
- optimal policy
- dynamic programming
- markov decision process
- infinite horizon
- steady state
- discounted reward
- transition probabilities
- average cost
- monte carlo
- finite horizon
- reinforcement learning
- model free
- stationary distribution
- markov model
- random walk
- stochastic process
- optimal solution
- partially observable markov decision processes
- temporal difference
- confidence intervals
- fixed point
- markov processes
- search algorithm
- convergence rate
- sample size
- particle filter
- linear programming
- least squares
- search space