Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning.
Tadashi KozunoDongqi HanKenji DoyaPublished in: CoRR (2019)
Keyphrases
- noise tolerant
- reinforcement learning
- policy evaluation
- temporal difference
- model free
- function approximation
- least squares
- markov decision processes
- policy iteration
- optimal policy
- monte carlo
- noisy data
- variance reduction
- semi parametric
- evaluation function
- data sets
- transfer learning
- state space
- dynamic programming
- learning algorithm