Login / Signup

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations.

Rima HazraSayan LayekSomnath BanerjeeSoujanya Poria
Published in: CoRR (2024)
Keyphrases