Sign in

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.

Andrew LeeXiaoyan BaiItamar PresMartin WattenbergJonathan K. KummerfeldRada Mihalcea
Published in: CoRR (2024)
Keyphrases