Login / Signup
Simon Lermen
Publication Activity (10 Years)
Years Active: 2023-2023
Publications (10 Years): 4
Top Topics
Fine Tune
Instant Messaging
General Purpose
Learning Perl
Top Venues
CoRR
</>
Publications
</>
Simon Lermen
,
Charlie Rogers-Smith
,
Jeffrey Ladish
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B.
CoRR
(2023)
Simon Lermen
,
Ondrej Kvapil
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability.
CoRR
(2023)
Pranav Gade
,
Simon Lermen
,
Charlie Rogers-Smith
,
Jeffrey Ladish
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B.
CoRR
(2023)
Teun van der Weij
,
Simon Lermen
,
Leon Lang
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios.
CoRR
(2023)