Login / Signup

AI Sandbagging: Language Models can Strategically Underperform on Evaluations.

Teun van der WeijFelix HofstätterOllie JaffeSamuel F. BrownFrancis Rhys Ward
Published in: CoRR (2024)
Keyphrases