Login / Signup

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation.

Xiaoying ZhangJean-Francois TonWei ShenHongning WangYang Liu
Published in: CoRR (2024)
Keyphrases