Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space.
Leo SchwinnDavid DobreSophie XhonneuxGauthier GidelStephan GünnemannPublished in: CoRR (2024)
Keyphrases
- open source
- embedding space
- euclidean space
- manifold learning
- low dimensional
- graph embedding
- dimensionality reduction
- high dimensional
- input space
- geometric structure
- data points
- dynamic time warping
- geodesic distance
- similarity search
- data sets
- multi dimensional
- gaussian mixture
- feature representation
- subspace learning
- nonlinear dimensionality reduction
- feature space
- feature extraction