Representation Engineering: A Top-Down Approach to AI Transparency.
Andy ZouLong PhanSarah ChenJames CampbellPhillip GuoRichard RenAlexander PanXuwang YinMantas MazeikaAnn-Kathrin DombrowskiShashwat GoelNathaniel LiMichael J. ByunZifan WangAlex MallenSteven BasartSanmi KoyejoDawn SongMatt FredriksonJ. Zico KolterDan HendrycksPublished in: CoRR (2023)