A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation.
Linh The NguyenNguyen Luong TranLong DoanManh LuongDat Quoc NguyenPublished in: CoRR (2022)
Keyphrases
- high quality
- machine translation
- statistical machine translation
- broadcast news
- query translation
- cross language information retrieval
- text to speech
- machine translation system
- english text
- cross language
- language resources
- parallel corpus
- spoken language
- speech recognition
- real world
- cross language retrieval
- target language
- english chinese
- speech signal
- cross lingual
- speech synthesis
- language learning
- bilingual dictionaries
- parallel corpora
- speaker identification
- chinese web
- web scale
- pronominal anaphora
- chinese english
- speech recognition technology
- comparable corpora
- real life
- noisy environments
- automatic speech recognition
- small scale
- english language
- million images
- benchmark datasets
- audio visual
- information extraction
- language independent
- high resolution
- source language
- cross language ir
- finite state transducers
- co occurrence
- feature set
- out of vocabulary
- language model
- image quality
- named entity recognition
- document representation
- synthetic datasets
- word level
- language identification
- translation model