Itihasa: A large-scale corpus for Sanskrit to English translation.
Rahul AralikatteMiryam de LhoneuxAnoop KunchukuttanAnders SøgaardPublished in: CoRR (2021)
Keyphrases
- machine translation
- statistical machine translation
- parallel corpus
- machine translation system
- target language
- parallel corpora
- chinese english
- source language
- cross lingual
- cross language information retrieval
- mono lingual
- query translation
- sentence pairs
- comparable corpora
- language independent
- word alignment
- english words
- information extraction
- pos tagging
- natural language processing
- english chinese
- link grammar
- training corpus
- word sense disambiguation
- bilingual dictionaries
- language resources
- natural language
- wide coverage
- open domain
- word pairs
- natural language text
- cross language
- cross language retrieval