ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts.
Felipe SoaresMark StevensonDiego BartoloméAnna ZaretskayaPublished in: LREC (2020)
Keyphrases
- parallel corpus
- word alignment
- machine translation system
- cross lingual
- target language
- machine translation
- language independent
- information retrieval
- query translation
- source language
- cross language information retrieval
- natural language
- statistical machine translation
- multi document summarization
- text mining
- document classification
- statistical model
- maximum likelihood
- parallel corpora
- natural language processing