Login / Signup
Learning diverse attacks on large language models for robust red-teaming and safety tuning.
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
Sung Ju Hwang
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
Published in:
CoRR (2024)
Keyphrases
</>
language model
probabilistic model
active learning
n gram
language modeling
language modelling
text classification
text documents
smoothing methods
statistical language models