Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation.
Renjie ZhengJunkun ChenMingbo MaLiang HuangPublished in: CoRR (2021)
Keyphrases
- machine translation system
- machine translation
- cross language information retrieval
- text to speech
- chinese english
- parallel corpora
- audio visual
- text to speech synthesis
- query translation
- cross language
- text input
- source language
- proper names
- comparable corpora
- word alignment
- multiword
- language resources
- multimodal interfaces
- english text
- text retrieval
- statistical machine translation
- english chinese
- word pairs
- multi stream
- speech recognition systems
- word level
- target language
- text documents
- multi modal
- multimodal interaction
- prosodic features
- parallel corpus
- speech recognition
- bilingual dictionaries
- cross language retrieval
- cross lingual
- broadcast news
- statistical translation models
- spontaneous speech
- speech signal
- speech sounds
- translation model
- emotional speech
- sentence pairs
- natural language processing
- lexical features
- speech synthesis
- acoustic features
- language independent
- information retrieval
- formant frequencies
- parallel texts
- music information retrieval
- video search
- bilingual lexicon
- text mining