Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning.
Kenny YoungRichard S. SuttonPublished in: CoRR (2020)
Keyphrases
- policy evaluation
- reinforcement learning
- least squares
- temporal difference
- model free
- monte carlo
- policy iteration
- markov decision processes
- function approximation
- variance reduction
- semi parametric
- td learning
- reinforcement learning algorithms
- learning algorithm
- action selection
- optimal policy
- state space
- evaluation function
- reinforcement learning methods
- optimal control
- radial basis function
- transfer learning
- importance sampling
- markov decision process
- linear programming
- average reward
- markov decision problems
- multi agent
- machine learning