Sign in

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game.

Sam ToyerOlivia WatkinsEthan Adrian MendesJustin SvegliatoLuke BaileyTiffany WangIsaac OngKarim ElmaaroufiPieter AbbeelTrevor DarrellAlan RitterStuart Russell
Published in: CoRR (2023)
Keyphrases