Q-learning and policy iteration algorithms for stochastic shortest path problems.
Huizhen YuDimitri P. BertsekasPublished in: Ann. Oper. Res. (2013)
Keyphrases
- policy iteration
- stochastic approximation
- markov decision processes
- model free
- learning algorithm
- sample path
- least squares
- fixed point
- reinforcement learning
- temporal difference
- optimal policy
- shortest path problem
- monte carlo
- finite state
- optimization problems
- policy evaluation
- convergence rate
- optimal control
- linear programming
- combinatorial optimization
- traveling salesman problem
- combinatorial optimization problems
- infinite horizon
- reinforcement learning algorithms
- state space