MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling.
Nathan GodeyRoman CastagnéÉric de la ClergerieBenoît SagotPublished in: EMNLP (Findings) (2022)
Keyphrases
- end to end
- language modeling
- language model
- information retrieval
- query expansion
- retrieval model
- n gram
- admission control
- cross lingual
- congestion control
- text localization and recognition
- digital libraries
- mixture model
- named entities
- learning algorithm
- machine learning
- application layer
- scalable video
- statistical language models