Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation.

Published in: CoRR (2024)

Keyphrases