Login / Signup

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret.

Lukas FluriLeon LangAlessandro AbatePatrick ForréDavid KruegerJoar Skalse
Published in: CoRR (2024)
Keyphrases
  • reward function
  • data sets
  • decision trees
  • training error
  • data mining
  • lower bound
  • active learning
  • bayesian inference