Login / Signup

Scalable Ensembling For Mitigating Reward Overoptimisation.

Ahmed M. AhmedRafael RafailovStepan SharkovXuechen LiSanmi Koyejo
Published in: CoRR (2024)
Keyphrases
  • reinforcement learning
  • predictive accuracy
  • web scale
  • database
  • data sets
  • long run
  • highly scalable
  • machine learning
  • social networks
  • multiscale
  • data structure
  • decision support system