Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation.
Rusheb ShahQuentin Feuillade-MontixiSoroush PourArush TagadeStephen CasperJavier RandoPublished in: CoRR (2023)
Keyphrases
- black box
- language model
- language modeling
- probabilistic model
- information retrieval
- black boxes
- white box
- n gram
- language modelling
- document retrieval
- query expansion
- retrieval model
- speech recognition
- ad hoc information retrieval
- statistical language models
- pseudo relevance feedback
- white box testing
- vector space model
- test cases
- translation model
- smoothing methods
- text classification
- context sensitive
- document ranking
- term dependencies
- query terms
- test collection
- language models for information retrieval
- language model for information retrieval
- retrieval effectiveness
- document length
- relevant documents
- integration testing
- error rate