A Multilingual Parallel Corpora Collection Effort for Indian Languages.
Shashank SiripragadaJerin PhilipVinay P. NamboodiriC. V. JawaharPublished in: LREC (2020)
Keyphrases
- parallel corpora
- cross lingual information retrieval
- cross lingual
- indian languages
- language independent
- machine translation
- comparable corpora
- cross language
- cross language information retrieval
- language modeling
- chinese english
- multi lingual
- document collections
- query translation
- parallel corpus
- labor intensive
- language identification
- machine translation system
- statistical machine translation
- news articles
- text classification
- document images
- artificial intelligence
- word pairs
- document clustering
- bilingual dictionaries
- translation model
- sentence level
- wikipedia articles
- transfer learning
- text retrieval
- language model
- collaborative filtering
- information extraction