Arthur Conmy

Publication Activity (10 Years)

Years Active: 2021-2024
Publications (10 Years): 14

Top Topics

Dictionary Learning

Regularization Methods

Object Identification

Compressive Sensing

Top Venues

Publications

Connor Kissane, Robert Krzyzanowski, Joseph Isaac Bloom, Arthur Conmy, Neel Nanda
Interpreting Attention Layer Outputs with Sparse Autoencoders. CoRR (2024)
Senthooran Rajamanoharan, Tom Lieberum, Nicolas Sonnerat, Arthur Conmy, Vikrant Varma, János Kramár, Neel Nanda
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders. CoRR (2024)
Nicholas Carlini, Daniel Paleka, Krishnamurthy (Dj) Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr
Stealing Part of a Production Language Model. CoRR (2024)
Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda
Improving Dictionary Learning with Gated Sparse Autoencoders. CoRR (2024)
Rhys Gould, Euan Ong, George Ogden, Arthur Conmy
Successor Heads: Recurring, Interpretable Attention Heads In The Wild. ICLR (2024)
Rhys Gould, Euan Ong, George Ogden, Arthur Conmy
Successor Heads: Recurring, Interpretable Attention Heads In The Wild. CoRR (2023)
Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small. ICLR (2023)
Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda
Copy Suppression: Comprehensively Understanding an Attention Head. CoRR (2023)
Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
Towards Automated Circuit Discovery for Mechanistic Interpretability. CoRR (2023)
Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
Towards Automated Circuit Discovery for Mechanistic Interpretability. NeurIPS (2023)
Aaquib Syed, Can Rager, Arthur Conmy
Attribution Patching Outperforms Automated Circuit Discovery. CoRR (2023)
Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small. CoRR (2022)
Arthur Conmy, Subhadip Mukherjee, Carola-Bibiane Schönlieb
Stylegan-Induced Data-Driven Regularization for Inverse Problems. ICASSP (2022)
Arthur Conmy, Subhadip Mukherjee, Carola-Bibiane Schönlieb
StyleGAN-induced data-driven regularization for inverse problems. CoRR (2021)