Login / Signup

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits.

Andis DragunsAndrew GritsevskiySumeet Ramesh MotwaniCharlie Rogers-SmithJeffrey LadishChristian Schröder de Witt
Published in: CoRR (2024)
Keyphrases