Sign in
Nora Belrose
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 8
Top Topics
Closed Form
Top Venues
CoRR
ICML
NeurIPS
</>
Publications
</>
Nora Belrose
,
Quintin Pope
,
Lucia Quirke
,
Alex Mallen
,
Xiaoli Fern
Neural Networks Learn Statistics of Increasing Complexity.
CoRR
(2024)
Nora Belrose
,
David Schneider-Joseph
,
Shauli Ravfogel
,
Ryan Cotterell
,
Edward Raff
,
Stella Biderman
LEACE: Perfect linear concept erasure in closed form.
CoRR
(2023)
Tony Tong Wang
,
Adam Gleave
,
Tom Tseng
,
Kellin Pelrine
,
Nora Belrose
,
Joseph Miller
,
Michael D. Dennis
,
Yawen Duan
,
Viktor Pogrebniak
,
Sergey Levine
,
Stuart Russell
Adversarial Policies Beat Superhuman Go AIs.
ICML
(2023)
Nora Belrose
,
David Schneider-Joseph
,
Shauli Ravfogel
,
Ryan Cotterell
,
Edward Raff
,
Stella Biderman
LEACE: Perfect linear concept erasure in closed form.
NeurIPS
(2023)
Alex Mallen
,
Nora Belrose
Eliciting Latent Knowledge from Quirky Language Models.
CoRR
(2023)
Nora Belrose
,
Zach Furman
,
Logan Smith
,
Danny Halawi
,
Igor Ostrovsky
,
Lev McKinney
,
Stella Biderman
,
Jacob Steinhardt
Eliciting Latent Predictions from Transformers with the Tuned Lens.
CoRR
(2023)
Tony Tong Wang
,
Adam Gleave
,
Nora Belrose
,
Tom Tseng
,
Joseph Miller
,
Michael D. Dennis
,
Yawen Duan
,
Viktor Pogrebniak
,
Sergey Levine
,
Stuart Russell
Adversarial Policies Beat Professional-Level Go AIs.
CoRR
(2022)
Adam Gleave
,
Mohammad Taufeeque
,
Juan Rocamonde
,
Erik Jenner
,
Steven H. Wang
,
Sam Toyer
,
Maximilian Ernestus
,
Nora Belrose
,
Scott Emmons
,
Stuart Russell
imitation: Clean Imitation Learning Implementations.
CoRR
(2022)