Login / Signup

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors.

Tinghao XieXiangyu QiYi ZengYangsibo HuangUdari Madhushani SehwagKaixuan HuangLuxi HeBoyi WeiDacheng LiYing ShengRuoxi JiaBo LiKai LiDanqi ChenPeter HendersonPrateek Mittal
Published in: CoRR (2024)
Keyphrases