Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game.
Sam ToyerOlivia WatkinsEthan Adrian MendesJustin SvegliatoLuke BaileyTiffany WangIsaac OngKarim ElmaaroufiPieter AbbeelTrevor DarrellAlan RitterStuart RussellPublished in: CoRR (2023)