Login / Signup
Learning Optimal Advantage from Preferences and Mistaking it for Reward.
W. Bradley Knox
Stephane Hatgis-Kessell
Sigurdur O. Adalgeirsson
Serena Booth
Anca D. Dragan
Peter Stone
Scott Niekum
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
learning algorithm
online learning
learning systems
learning process
supervised learning
learning analytics
data sets
neural network
learning environment
active learning
worst case
learning preferences