SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits.
Subhojyoti MukherjeeQiaomin XieJosiah HannaRobert D. NowakPublished in: CoRR (2023)
Keyphrases
- experimental design
- policy evaluation
- least squares
- linear model
- active learning
- empirical studies
- temporal difference
- semi parametric
- variance reduction
- sample size
- model free
- reinforcement learning
- class imbalance
- function approximation
- policy iteration
- data sets
- markov decision processes
- monte carlo
- classification accuracy
- data mining