Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT.

Published in: CoRR (2024)

Keyphrases