CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus.
Changhan WangAnne WuJuan Miguel PinoPublished in: CoRR (2020)
Keyphrases
- parallel corpus
- machine translation system
- cross language information retrieval
- comparable corpora
- chinese english
- cross lingual
- statistical machine translation
- parallel corpora
- machine translation
- language resources
- cross lingual information retrieval
- cross language
- query translation
- language independent
- mono lingual
- parallel texts
- sentence pairs
- cross language ir
- linguistic resources
- english words
- bilingual dictionaries
- parallel computing
- lexical knowledge
- translation model
- massively parallel
- word alignment
- manually annotated
- language modeling
- training corpus
- metadata
- news articles
- multiword
- bilingual lexicon
- multilingual information retrieval
- text corpora
- language specific
- wide coverage
- out of vocabulary
- machine learning