Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Kevin LiuStephen CasperDylan Hadfield-MenellJacob AndreasPublished in: EMNLP (2023)
Keyphrases
- language model
- internal representations
- language modeling
- cognitive processing
- cognitive model
- probabilistic model
- n gram
- information retrieval
- test collection
- retrieval model
- high level
- ad hoc information retrieval
- sensory data
- smoothing methods
- mixture model
- input data
- cognitive processes
- query expansion
- receptive fields
- translation model
- training data
- cognitive science
- computer simulation
- hidden units
- decision function