ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair.
Alham Fikri AjiTirana Noor FatyanosaRadityo Eko PrasojoPhilip ArthurSuci FitrianySalma QonitahNadhifa ZulfaTomi SantosoMahendra DataPublished in: CoRR (2022)
Keyphrases
- comparable corpora
- cross language information retrieval
- parallel corpora
- chinese english
- parallel corpus
- machine translation
- bilingual lexicon
- news articles
- language modeling
- cross language
- language resources
- cross lingual
- cross lingual information retrieval
- query translation
- text corpora
- machine translation system
- real world
- statistical machine translation
- linguistic resources
- pairwise
- translation model
- question answering
- text documents
- word pairs
- wide variety
- bilingual dictionaries
- natural language processing
- word level
- language model
- wikipedia articles
- sample points
- sample size
- data sets
- cross language ir
- out of vocabulary
- sentence level
- similarity scores
- target language
- language independent
- digital libraries