Itihasa: A large-scale corpus for Sanskrit to English translation.
Rahul AralikatteMiryam de LhoneuxAnoop KunchukuttanAnders SøgaardPublished in: WAT@ACL/IJCNLP (2021)
Keyphrases
- machine translation
- statistical machine translation
- parallel corpus
- machine translation system
- target language
- parallel corpora
- cross lingual
- mono lingual
- chinese english
- source language
- cross language information retrieval
- query translation
- comparable corpora
- sentence pairs
- natural language processing
- word alignment
- english words
- pos tagging
- language independent
- word sense disambiguation
- natural language
- broad coverage
- training corpus
- information extraction
- language resources
- bilingual dictionaries
- translation model
- word sense
- language model
- finite state transducers
- cross language
- language modeling
- wide coverage
- word level
- information retrieval
- cross language retrieval
- machine learning