CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus.
Changhan WangJuan Miguel PinoAnne WuJiatao GuPublished in: CoRR (2020)
Keyphrases
- parallel corpus
- machine translation system
- cross language information retrieval
- comparable corpora
- chinese english
- statistical machine translation
- parallel corpora
- cross lingual
- language resources
- machine translation
- cross language
- query translation
- cross lingual information retrieval
- cross language ir
- language independent
- parallel texts
- english words
- translation model
- sentence pairs
- lexical knowledge
- text corpora
- word pairs
- test set
- wide variety
- mono lingual
- target language
- bilingual dictionaries
- word alignment
- digital libraries
- wide coverage
- news articles
- real world
- wordnet
- linguistic resources
- open domain
- source language
- supervised machine learning
- training corpus
- multiword
- manually annotated
- language modeling