Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment.
Tianshu YuHaoyu GaoTing-En LinMin YangYuchuan WuWentao MaChao WangFei HuangYongbin LiPublished in: ACL (1) (2023)
Keyphrases
- cross modal
- spoken dialog
- spoken dialog systems
- multi modal
- speech processing
- english text
- text to speech
- multimedia retrieval
- text retrieval
- speech recognition
- information retrieval
- image retrieval
- visual recognition
- multimedia databases
- keywords
- word level
- signal processing
- text mining
- image search
- natural language processing
- pattern recognition