Policy Evaluation Using the Ω-Return.
Philip S. ThomasScott NiekumGeorgios TheocharousGeorge Dimitri KonidarisPublished in: NIPS (2015)
Keyphrases
- policy evaluation
- least squares
- reinforcement learning
- temporal difference
- monte carlo
- model free
- matrix inversion
- markov decision processes
- policy iteration
- variance reduction
- semi parametric
- function approximation
- optimal policy
- partially observable markov decision processes
- linear model
- multi agent
- reinforcement learning algorithms
- markov decision problems
- decision making
- neural network