Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
Ayyoob ImaniGooghariPeiqin LinAmir Hossein KargaranSilvia SeveriniMasoud Jalili SabetNora KassnerChunlan MaHelmut SchmidAndré F. T. MartinsFrançois YvonHinrich SchützePublished in: CoRR (2023)
Keyphrases
- language model
- language modeling
- comparable corpora
- cross lingual
- statistical machine translation
- parallel corpus
- language independent
- n gram
- chinese english
- linguistic resources
- translation model
- parallel corpora
- cross lingual information retrieval
- cross language information retrieval
- query expansion
- document retrieval
- retrieval model
- information retrieval
- probabilistic model
- language modelling
- speech recognition
- machine translation system
- context sensitive
- query terms
- language models for information retrieval
- pseudo relevance feedback
- query translation
- ad hoc information retrieval
- test collection
- machine translation
- bilingual dictionaries
- statistical language models
- news articles
- smoothing methods
- text classification
- document level
- vector space model
- term dependencies
- text mining
- okapi bm
- document ranking
- machine learning