An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path.
Ajin George JosephShalabh BhatnagarPublished in: CoRR (2018)
Keyphrases
- policy iteration
- sample path
- model free
- markov decision process
- markov decision processes
- reinforcement learning
- optimal policy
- average reward
- policy evaluation
- reinforcement learning algorithms
- search algorithm
- function approximation
- infinite horizon
- temporal difference
- finite horizon
- fixed point
- least squares
- markov decision problems
- search space
- state space
- average cost
- asymptotic analysis
- objective function
- support vector machine
- bayesian networks