Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming.
Dimitri P. BertsekasHuizhen YuPublished in: Math. Oper. Res. (2012)
Keyphrases
- policy iteration
- markov decision processes
- dynamic programming
- optimal policy
- infinite horizon
- average reward
- stochastic approximation
- sample path
- state space
- optimal control
- discounted reward
- reinforcement learning
- markov decision problems
- finite state
- markov decision process
- finite horizon
- linear programming
- policy evaluation
- average cost
- long run
- approximate dynamic programming
- actor critic
- reinforcement learning algorithms
- decision problems
- multistage
- temporal difference learning
- stereo matching
- markov games
- partially observable
- model free
- approximate policy iteration
- stochastic games
- action space
- linear program
- state and action spaces
- sufficient conditions
- supply chain