Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws.

Kush Bhatia Wenshuo Guo Jacob Steinhardt

Published in: AISTATS (2023)

Keyphrases

optimal design
reinforcement learning
learning algorithm
learning process
learning systems
learning tasks
prior knowledge
active learning
learning community
supervised learning
knowledge acquisition
mobile learning
neural network
upper bound
online learning
multi armed bandits