ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair.
Alham Fikri AjiRadityo Eko PrasojoTirana Noor FatyanosaPhilip ArthurSuci FitrianySalma QonitahNadhifa ZulfaTomi SantosoMahendra DataPublished in: PACLIC (2021)
Keyphrases
- comparable corpora
- cross language information retrieval
- parallel corpora
- chinese english
- parallel corpus
- machine translation
- text corpora
- news articles
- language modeling
- cross lingual
- linguistic resources
- language resources
- cross language
- query translation
- bilingual lexicon
- cross lingual information retrieval
- machine translation system
- real world
- pairwise
- word pairs
- real images are presented
- translation model
- statistical machine translation
- wide variety
- text documents
- sentence pairs
- information retrieval
- multi lingual
- bilingual dictionaries
- language independent
- sentence level
- query expansion
- multilingual information retrieval
- natural language processing
- digital libraries