Login / Signup
Nina Rimsky
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 4
Top Topics
Language Models For Information Retrieval
Language Model
Probabilistic Model
Document Retrieval
Top Venues
CoRR
</>
Publications
</>
Andy Arditi
,
Oscar Obeso
,
Aaquib Syed
,
Daniel Paleka
,
Nina Rimsky
,
Wes Gurnee
,
Neel Nanda
Refusal in Language Models Is Mediated by a Single Direction.
CoRR
(2024)
Dawn Lu
,
Nina Rimsky
Investigating Bias Representations in Llama 2 Chat via Activation Steering.
CoRR
(2024)
Sarah Ball
,
Frauke Kreuter
,
Nina Rimsky
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models.
CoRR
(2024)
Nina Rimsky
,
Nick Gabrieli
,
Julian Schulz
,
Meg Tong
,
Evan Hubinger
,
Alexander Matt Turner
Steering Llama 2 via Contrastive Activation Addition.
CoRR
(2023)