How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model.
Michael HannaOllie LiuAlexandre VariengienPublished in: CoRR (2023)
Keyphrases
- language model
- pre trained
- language modeling
- n gram
- probabilistic model
- training data
- document retrieval
- speech recognition
- retrieval model
- test collection
- mixture model
- ad hoc information retrieval
- information retrieval
- training examples
- query expansion
- language modelling
- context sensitive
- smoothing methods
- control signals
- translation model
- generative model
- hidden markov models
- active learning
- cross lingual
- clustering algorithm
- multimedia