Login / Signup
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations.
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
Published in:
CoRR (2024)
Keyphrases
</>
language model
probabilistic model
test collection
language modeling
speech recognition
query expansion
document ranking
statistical language models
information retrieval
mixture model
document retrieval
expectation maximization
n gram
context sensitive