Login / Signup

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space.

Leo SchwinnDavid DobreSophie XhonneuxGauthier GidelStephan Günnemann
Published in: CoRR (2024)
Keyphrases