Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation.
Yuta SaitoShunsuke AiharaMegumi MatsutaniYusuke NaritaPublished in: NeurIPS Datasets and Benchmarks (2021)
Keyphrases
- policy evaluation
- least squares
- temporal difference
- model free
- reinforcement learning
- variance reduction
- policy iteration
- matrix inversion
- monte carlo
- markov decision processes
- function approximation
- semi parametric
- random sampling
- action selection
- statistical inference
- markov chain
- graphical models
- artificial neural networks
- training data