HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.
Mantas MazeikaLong PhanXuwang YinAndy ZouZifan WangNorman MuElham SakhaeeNathaniel LiSteven BasartBo LiDavid A. ForsythDan HendrycksPublished in: CoRR (2024)