Login / Signup

Learning diverse attacks on large language models for robust red-teaming and safety tuning.

Seanie LeeMinsu KimLynn CherifDavid DobreJuho LeeSung Ju HwangKenji KawaguchiGauthier GidelYoshua BengioNikolay MalkinMoksh Jain
Published in: CoRR (2024)
Keyphrases
  • language model
  • probabilistic model
  • active learning
  • n gram
  • language modeling
  • language modelling
  • text classification
  • text documents
  • smoothing methods
  • statistical language models