Login / Signup

Spontaneous Reward Hacking in Iterative Self-Refinement.

Jane PanHe HeSamuel R. BowmanShi Feng
Published in: CoRR (2024)
Keyphrases
  • reinforcement learning
  • real time
  • iterative optimization
  • multi agent
  • iterative methods
  • database
  • databases
  • search algorithm
  • refinement step