The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.
Lucius BushnaqStefan HeimersheimNicholas Goldowsky-DillDan BraunJake MendelKaarel HänniAvery GriffinJörn StöhlerMagdalena WacheMarius HobbhahnPublished in: CoRR (2024)