Login / Signup
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer.
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
Published in:
CoRR (2024)
Keyphrases
</>
total variation
multi agent
worst case
neural network
semi supervised
risk management
web services
data structure
image restoration
regularization framework