Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws.

Kush Bhatia Wenshuo Guo Jacob Steinhardt

Published in: CoRR (2023)

Keyphrases

optimal design
online learning
reinforcement learning
learning process
learning algorithm
supervised learning
prior knowledge
upper bound
data driven
learning systems
inductive inference
multi armed bandits
lower bound
neural network
training data
background knowledge
feature selection
elementary school
bandit problems