• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

The Capacity for Moral Self-Correction in Large Language Models.

Deep GanguliAmanda AskellNicholas SchieferThomas I. LiaoKamile LukosiuteAnna ChenAnna GoldieAzalia MirhoseiniCatherine OlssonDanny HernandezDawn DrainDustin LiEli Tran-JohnsonEthan PerezJackson KernionJamie KerrJared MuellerJoshua LandauKamal NdousseKarina NguyenLiane LovittMichael SellittoNelson ElhageNoemí MercadoNova DasSarmaOliver RauschRobert LasenbyRobin LarsonSam RingerSandipan KunduSaurav KadavathScott JohnstonShauna KravecSheer El ShowkTamera LanhamTimothy Telleen-LawtonTom HenighanTristan HumeYuntao BaiZac Hatfield-DoddsBen MannDario AmodeiNicholas JosephSam McCandlishTom BrownChristopher OlahJack ClarkSamuel R. BowmanJared Kaplan
Published in: CoRR (2023)
Keyphrases