Login / Signup

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.

Abhay SheshadriAidan EwartPhillip GuoAengus LynchCindy WuVivek HebbarHenry SleightAsa Cooper SticklandEthan PerezDylan Hadfield-MenellStephen Casper
Published in: CoRR (2024)
Keyphrases