AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages.
Anoop KunchukuttanDivyanshu KakwaniSatish GollaGokul N. C.Avik BhattacharyyaMitesh M. KhapraPratyush KumarPublished in: CoRR (2020)
Keyphrases
- parallel corpus
- statistical machine translation
- cross lingual
- sentence pairs
- language independent
- machine translation system
- machine translation
- target language
- word alignment
- parallel corpora
- source language
- training corpus
- query translation
- comparable corpora
- translation model
- european languages
- cross language information retrieval
- chinese english
- word pairs
- parallel texts
- bilingual dictionaries
- language modeling
- vector space
- word recognition
- cross language
- n gram
- machine learning
- word segmentation
- linguistic resources
- language specific
- indian languages
- text classification
- word sense disambiguation
- natural language processing
- multiword
- information extraction
- out of vocabulary
- knowledge representation
- natural language