Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT.
Zhengfu HeXuyang GeQiong TangTianxiang SunQinyuan ChengXipeng QiuPublished in: CoRR (2024)
Keyphrases
- dictionary learning
- evaluation function
- sparse representation
- sparse coding
- temporal difference learning
- image patches
- game playing
- minimax search
- game tree
- board game
- temporal difference
- data mining
- natural images
- function approximation
- monte carlo
- fixed point
- learning tasks
- image representation
- image classification
- prior knowledge
- pattern recognition
- search algorithm
- reinforcement learning
- multiscale
- feature extraction
- face recognition