Login / Signup

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States.

Zhenhong ZhouHaiyang YuXinghua ZhangRongwu XuFei HuangYongbin Li
Published in: CoRR (2024)
Keyphrases