Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing.
Jiabao JiBairu HouAlexander RobeyGeorge J. PappasHamed HassaniYang ZhangEric WongShiyu ChangPublished in: CoRR (2024)
Keyphrases
- semantic smoothing
- language model
- ddos attacks
- context sensitive
- language modeling
- text classification
- translation model
- agglomerative clustering
- information retrieval
- probabilistic model
- document retrieval
- n gram
- retrieval model
- query expansion
- test collection
- query terms
- vector space model
- smoothing methods
- multiword
- language models for information retrieval
- document representation
- machine learning