A Practical Guide of Off-Policy Evaluation for Bandit Problems.
Masahiro KatoKenshi AbeKaito AriuShota YasuiPublished in: CoRR (2020)
Keyphrases
- bandit problems
- policy evaluation
- least squares
- monte carlo
- reinforcement learning
- decision problems
- temporal difference
- model free
- markov decision processes
- optimal policy
- policy iteration
- function approximation
- variance reduction
- semi parametric
- partially observable markov decision processes
- evaluation function
- linear model
- statistical inference
- markov chain
- state space
- linear regression
- influence diagrams
- markov decision problems
- radial basis function
- decision making