Login / Signup
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems.
Xiang Ji
Huazheng Wang
Minshuo Chen
Tuo Zhao
Mengdi Wang
Published in:
CoRR (2023)
Keyphrases
</>
bandit problems
learning algorithm
learning process
decision making
human experts
machine learning
online learning
learning systems
action selection
multi agent
optimal solution
active learning
special case
language acquisition
human learning
preference learning