Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws.
Kush BhatiaWenshuo GuoJacob SteinhardtPublished in: CoRR (2023)
Keyphrases
- optimal design
- online learning
- reinforcement learning
- learning process
- learning algorithm
- supervised learning
- prior knowledge
- upper bound
- data driven
- learning systems
- inductive inference
- multi armed bandits
- lower bound
- neural network
- training data
- background knowledge
- feature selection
- elementary school
- bandit problems