MuST-C: a Multilingual Speech Translation Corpus.
Mattia Antonino Di GangiRoldano CattoniLuisa BentivogliMatteo NegriMarco TurchiPublished in: NAACL-HLT (1) (2019)
Keyphrases
- parallel corpus
- cross language information retrieval
- machine translation system
- chinese english
- language resources
- comparable corpora
- parallel corpora
- statistical machine translation
- cross lingual
- multi lingual
- broadcast news
- machine translation
- language independent
- query translation
- spontaneous speech
- cross language
- cross lingual information retrieval
- conversational speech
- out of vocabulary
- sentence pairs
- word alignment
- translation model
- speech recognition
- spoken document retrieval
- cross language ir
- lexical features
- bilingual dictionaries
- english words
- lexical knowledge
- finite state transducers
- speaker identification
- information access
- language model
- news articles
- recognition engine
- multiword
- word pairs
- training corpus
- text to speech
- text corpora
- linguistic resources
- bilingual lexicon
- target language
- noisy environments
- language modeling
- spoken language
- question answering
- spanish language