Login / Signup

Preventing Reward Hacking with Occupancy Measure Regularization.

Cassidy LaidlawShivam SinghalAnca D. Dragan
Published in: CoRR (2024)
Keyphrases
  • similarity measure
  • image processing
  • reinforcement learning
  • distance measure
  • database
  • real time
  • machine learning
  • pairwise
  • information content
  • information theory
  • reward function
  • parameter selection