Sign in

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models.

Xianjun YangXiao WangQi ZhangLinda R. PetzoldWilliam Yang WangXun ZhaoDahua Lin
Published in: CoRR (2023)
Keyphrases