Login / Signup
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret.
Lukas Fluri
Leon Lang
Alessandro Abate
Patrick Forré
David Krueger
Joar Skalse
Published in:
CoRR (2024)
Keyphrases
</>
reward function
data sets
decision trees
training error
data mining
lower bound
active learning
bayesian inference