Login / Signup
Preventing Reward Hacking with Occupancy Measure Regularization.
Cassidy Laidlaw
Shivam Singhal
Anca D. Dragan
Published in:
CoRR (2024)
Keyphrases
</>
similarity measure
image processing
reinforcement learning
distance measure
database
real time
machine learning
pairwise
information content
information theory
reward function
parameter selection