Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes.
Gerhard HübnerManfred SchälPublished in: ZOR Methods Model. Oper. Res. (1991)
Keyphrases
- markov decision processes
- policy iteration
- optimal policy
- average reward
- infinite horizon
- actor critic
- sample path
- state space
- markov decision process
- reinforcement learning
- policy evaluation
- approximate dynamic programming
- dynamic programming
- finite horizon
- markov decision problems
- discounted reward
- average cost
- factored mdps
- finite state
- policy iteration algorithm
- decision processes
- transition matrices
- action space
- state and action spaces
- partially observable markov decision processes
- reinforcement learning algorithms
- total reward
- markov games
- state dependent
- long run
- multistage
- planning under uncertainty
- reward function
- dynamical systems
- least squares
- optical flow