An incremental off-policy search in a model-free Markov decision process using a single sample path.
Ajin George JosephShalabh BhatnagarPublished in: Mach. Learn. (2018)
Keyphrases
- policy iteration
- sample path
- model free
- markov decision process
- markov decision processes
- reinforcement learning
- average reward
- state space
- optimal policy
- search algorithm
- fixed point
- policy evaluation
- infinite horizon
- temporal difference
- function approximation
- reinforcement learning algorithms
- finite horizon
- search space
- machine learning
- optimal control
- markov decision problems
- convergence rate
- learning algorithm