Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space.

Published in: CoRR (2024)

Keyphrases