The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.
Nathaniel LiAlexander PanAnjali GopalSummer YueDaniel BerriosAlice GattiJustin D. LiAnn-Kathrin DombrowskiShashwat GoelLong PhanGabriel MukobiNathan Helm-BurgerRassin LababidiLennart JustenAndrew B. LiuMichael ChenIsabelle BarrassOliver ZhangXiaoyuan ZhuRishub TamirisaBhrugu BharathiAdam KhojaZhenqi ZhaoAriel Herbert-VossCort B. BreuerAndy ZouMantas MazeikaZifan WangPalash OswalWeiran LiuAdam A. HuntJustin Tienken-HarderKevin Y. ShihKemper TalleyJohn GuanRussell KaplanIan StenekerDavid CampbellBrad JokubaitisAlex LevinsonJean WangWilliam QianKallol Krishna KarmakarSteven BasartStephen FitzMindy LevinePonnurangam KumaraguruUday Kiran TupakulaVijay VaradharajanYan ShoshitaishviliJimmy BaKevin M. EsveltAlexandr WangDan HendrycksPublished in: CoRR (2024)