MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling.
Nathan GodeyRoman CastagnéÉric de la ClergerieBenoît SagotPublished in: CoRR (2022)
Keyphrases
- end to end
- language modeling
- language model
- n gram
- query expansion
- information retrieval
- retrieval model
- admission control
- text localization and recognition
- congestion control
- probabilistic model
- text classification
- named entities
- document retrieval
- cross lingual
- real world
- knowledge discovery
- similarity measure
- machine learning
- relevance model
- data mining