Break the Breakout: Reinventing LM Defense Against Jailbreak Attacks with Self-Refinement.
Heegyu KimSehyun YukHyunsouk ChoPublished in: CoRR (2024)
Keyphrases
- ddos attacks
- defense mechanisms
- language model
- countermeasures
- network security
- intrusion detection
- language modeling
- advanced research projects agency
- traffic analysis
- computer virus
- malicious attacks
- dos attacks
- chosen plaintext
- denial of service attacks
- refinement process
- security protocols
- computer security
- security mechanisms
- watermarking algorithm
- terrorist attacks
- malicious users
- information retrieval
- watermarking method
- watermarking scheme
- digital image watermarking
- operating system
- information systems
- data sets