SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits.
Subhojyoti MukherjeeQiaomin XieJosiah P. HannaRobert D. NowakPublished in: AISTATS (2024)
Keyphrases
- experimental design
- policy evaluation
- linear model
- least squares
- empirical studies
- reinforcement learning
- temporal difference
- active learning
- sample size
- semi parametric
- model free
- markov decision processes
- variance reduction
- feature selection
- monte carlo
- virtual learning environments
- function approximation
- markov chain
- support vector machine
- decision trees