Sign in

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.

Mantas MazeikaLong PhanXuwang YinAndy ZouZifan WangNorman MuElham SakhaeeNathaniel LiSteven BasartBo LiDavid A. ForsythDan Hendrycks
Published in: CoRR (2024)
Keyphrases
  • evaluation framework
  • evaluation process
  • evaluation methodology
  • open source
  • data sets
  • web pages
  • object oriented