The Capacity for Moral Self-Correction in Large Language Models.

Deep Ganguli Amanda Askell Nicholas Schiefer Thomas I. Liao Kamile Lukosiute Anna Chen Anna Goldie Azalia Mirhoseini Catherine Olsson Danny Hernandez Dawn Drain Dustin Li Eli Tran-Johnson Ethan Perez Jackson Kernion Jamie Kerr Jared Mueller Joshua Landau Kamal Ndousse Karina Nguyen Liane Lovitt Michael Sellitto Nelson Elhage Noemí Mercado Nova DasSarma Oliver Rausch Robert Lasenby Robin Larson Sam Ringer Sandipan Kundu Saurav Kadavath Scott Johnston Shauna Kravec Sheer El Showk Tamera Lanham Timothy Telleen-Lawton Tom Henighan Tristan Hume Yuntao Bai Zac Hatfield-Dodds Ben Mann Dario Amodei Nicholas Joseph Sam McCandlish Tom Brown Christopher Olah Jack Clark Samuel R. Bowman Jared Kaplan

Published in: CoRR (2023)

Keyphrases

language model
language modeling
speech recognition
probabilistic model
language modelling
n gram
document retrieval
retrieval model
information retrieval
test collection
query expansion
smoothing methods
ad hoc information retrieval
statistical language models
vector space model
context sensitive
query terms
relevance model
translation model
document length
document ranking