Login / Signup

The Capacity for Moral Self-Correction in Large Language Models.

Deep GanguliAmanda AskellNicholas SchieferThomas I. LiaoKamile LukosiuteAnna ChenAnna GoldieAzalia MirhoseiniCatherine OlssonDanny HernandezDawn DrainDustin LiEli Tran-JohnsonEthan PerezJackson KernionJamie KerrJared MuellerJoshua LandauKamal NdousseKarina NguyenLiane LovittMichael SellittoNelson ElhageNoemí MercadoNova DasSarmaOliver RauschRobert LasenbyRobin LarsonSam RingerSandipan KunduSaurav KadavathScott JohnstonShauna KravecSheer El ShowkTamera LanhamTimothy Telleen-LawtonTom HenighanTristan HumeYuntao BaiZac Hatfield-DoddsBen MannDario AmodeiNicholas JosephSam McCandlishTom BrownChristopher OlahJack ClarkSamuel R. BowmanJared Kaplan
Published in: CoRR (2023)
Keyphrases