Sign in

Codebook Features: Sparse and Discrete Interpretability for Neural Networks.

Alex TamkinMohammad TaufeequeNoah D. Goodman
Published in: CoRR (2023)
Keyphrases