L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources.
Raviraj JoshiPublished in: CoRR (2022)
Keyphrases
- language model
- statistical machine translation
- source language
- query expansion
- language modeling
- document retrieval
- parallel corpora
- translation model
- cross language retrieval
- n gram
- information retrieval
- query terms
- cross lingual
- document level
- parallel corpus
- language modelling
- retrieval model
- multiword
- probabilistic model
- machine translation
- statistical language models
- passage retrieval
- speech recognition
- test collection
- target language
- ad hoc retrieval
- machine translation system
- context sensitive
- pseudo feedback
- cross language information retrieval
- vector space model
- document ranking
- text retrieval
- chinese english
- pseudo relevance feedback
- cross language
- retrieval effectiveness
- bilingual dictionaries
- language models for information retrieval
- relevance model
- query translation
- web search
- smoothing methods
- text classification
- statistical language modeling
- natural language