Q-learning and enhanced policy iteration in discounted dynamic programming.
Dimitri P. BertsekasHuizhen YuPublished in: CDC (2010)
Keyphrases
- policy iteration
- dynamic programming
- markov decision processes
- optimal policy
- infinite horizon
- stochastic approximation
- state space
- average reward
- sample path
- reinforcement learning
- discounted reward
- optimal control
- markov decision process
- approximate dynamic programming
- finite state
- multistage
- markov decision problems
- reinforcement learning algorithms
- finite horizon
- linear programming
- long run
- average cost
- decision problems
- policy evaluation
- action space
- temporal difference learning
- stochastic games
- fixed point
- random walk
- markov games
- approximate policy iteration
- temporal difference
- initial state
- partially observable markov decision processes
- reward function
- model free
- lead time
- learning tasks
- sufficient conditions
- least squares
- multi agent