Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation.
Renjie ZhengJunkun ChenMingbo MaLiang HuangPublished in: ICML (2021)
Keyphrases
- machine translation system
- machine translation
- text to speech
- cross language information retrieval
- chinese english
- parallel corpora
- audio visual
- query translation
- cross language
- text to speech synthesis
- language resources
- prosodic features
- english text
- text retrieval
- multimodal interaction
- acoustic features
- source language
- word alignment
- bilingual dictionaries
- comparable corpora
- statistical machine translation
- speech synthesis
- broadcast news
- speech recognition systems
- speech sounds
- multiword
- proper names
- cross language retrieval
- parallel corpus
- visual speech
- cross lingual
- information retrieval
- multi modal
- speech recognition
- speech signal
- target language
- text input
- english chinese
- acoustic models
- spontaneous speech
- spoken document retrieval
- lexical features
- text documents
- multi stream
- translation model
- acoustic signal
- statistical translation models
- out of vocabulary
- speech recognizer
- hidden markov models
- spoken language
- keywords
- text mining
- bilingual lexicon
- language independent
- automatic speech recognition
- sentence level
- word pairs
- word level
- text corpora
- document collections
- machine readable dictionaries
- sound source
- multimodal interfaces