iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.
Divyanshu KakwaniAnoop KunchukuttanSatish GollaGokul N. C.Avik BhattacharyyaMitesh M. KhapraPratyush KumarPublished in: EMNLP (Findings) (2020)
Keyphrases
- cross lingual
- language modeling
- language model
- indian languages
- parallel corpus
- cross lingual information retrieval
- pre trained
- translation model
- query expansion
- cross language
- statistical machine translation
- document retrieval
- parallel corpora
- chinese english
- retrieval model
- n gram
- probabilistic model
- speech recognition
- information retrieval
- language independent
- linguistic resources
- machine translation
- out of vocabulary
- machine translation system
- cross language information retrieval
- query translation
- natural language processing
- text retrieval
- test collection
- training examples
- relevance model
- text classification
- query terms
- training data
- document images
- bayesian networks