CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus.
Changhan WangJuan Miguel PinoAnne WuJiatao GuPublished in: LREC (2020)
Keyphrases
- parallel corpus
- cross language information retrieval
- machine translation system
- chinese english
- comparable corpora
- cross lingual
- language resources
- statistical machine translation
- parallel corpora
- machine translation
- cross language
- cross lingual information retrieval
- mono lingual
- query translation
- language independent
- parallel texts
- cross language ir
- sentence pairs
- english words
- word alignment
- wide variety
- bilingual dictionaries
- language modeling
- translation model
- training corpus
- information retrieval
- lexical knowledge
- document retrieval
- neural network
- multilingual information retrieval
- test set
- linguistic resources
- information access
- text retrieval
- source language
- machine learning
- manually annotated