Sign in

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.

Deep GanguliLiane LovittJackson KernionAmanda AskellYuntao BaiSaurav KadavathBen MannEthan PerezNicholas SchieferKamal NdousseAndy JonesSam BowmanAnna ChenTom ConerlyNova DasSarmaDawn DrainNelson ElhageSheer El ShowkStanislav FortZac Hatfield-DoddsTom HenighanDanny HernandezTristan HumeJosh JacobsonScott JohnstonShauna KravecCatherine OlssonSam RingerEli Tran-JohnsonDario AmodeiTom BrownNicholas JosephSam McCandlishChris OlahJared KaplanJack Clark
Published in: CoRR (2022)
Keyphrases
  • lessons learned
  • language model
  • case study
  • probabilistic model
  • n gram
  • speech recognition
  • future directions
  • smoothing methods
  • language modeling
  • language modelling
  • statistical language models