Reliable Off-policy Evaluation for Reinforcement Learning.
Jie WangRui GaoHongyuan ZhaPublished in: CoRR (2020)
Keyphrases
- policy evaluation
- reinforcement learning
- temporal difference
- least squares
- model free
- function approximation
- monte carlo
- markov decision processes
- policy iteration
- variance reduction
- reinforcement learning algorithms
- td learning
- semi parametric
- state space
- multi agent
- partially observable
- reinforcement learning methods
- learning algorithm
- step size
- optimal control
- evaluation function
- optimal policy
- statistical inference
- supervised learning
- reward function
- markov chain
- dynamic programming