Login / Signup
Stefan Heimersheim
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 5
Top Topics
Semi Automated
Data Corruption
Prediction Accuracy
Analog Circuits
Top Venues
CoRR
NeurIPS
</>
Publications
</>
Stefan Heimersheim
,
Neel Nanda
How to use and interpret activation patching.
CoRR
(2024)
Lucius Bushnaq
,
Stefan Heimersheim
,
Nicholas Goldowsky-Dill
,
Dan Braun
,
Jake Mendel
,
Kaarel Hänni
,
Avery Griffin
,
Jörn Stöhler
,
Magdalena Wache
,
Marius Hobbhahn
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.
CoRR
(2024)
Lucius Bushnaq
,
Jake Mendel
,
Stefan Heimersheim
,
Dan Braun
,
Nicholas Goldowsky-Dill
,
Kaarel Hänni
,
Cindy Wu
,
Marius Hobbhahn
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.
CoRR
(2024)
Arthur Conmy
,
Augustine N. Mavor-Parker
,
Aengus Lynch
,
Stefan Heimersheim
,
Adrià Garriga-Alonso
Towards Automated Circuit Discovery for Mechanistic Interpretability.
CoRR
(2023)
Arthur Conmy
,
Augustine N. Mavor-Parker
,
Aengus Lynch
,
Stefan Heimersheim
,
Adrià Garriga-Alonso
Towards Automated Circuit Discovery for Mechanistic Interpretability.
NeurIPS
(2023)