Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models.
Sander LandMax BartoloPublished in: CoRR (2024)
Keyphrases
- language model
- automatically detecting
- language modeling
- automatic detection
- document retrieval
- n gram
- information retrieval
- probabilistic model
- retrieval model
- speech recognition
- language modelling
- test collection
- statistical language models
- query expansion
- training set
- smoothing methods
- language models for information retrieval
- soccer video
- active learning
- statistical language modeling