Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

Published in: EMNLP (2023)

Keyphrases