Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.

Tom LieberumMatthew RahtzJános KramárNeel NandaGeoffrey IrvingRohin ShahVladimir Mikulik
Published in: CoRR (2023)