Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment.
Tianshu YuHaoyu GaoTing-En LinMin YangYuchuan WuWentao MaChao WangFei HuangYongbin LiPublished in: CoRR (2023)
Keyphrases
- spoken dialog
- spoken dialog systems
- cross modal
- speech processing
- multi modal
- english text
- human machine
- multimedia retrieval
- speech recognition
- visual recognition
- text retrieval
- text mining
- image retrieval
- word level
- information retrieval
- signal processing
- text to speech
- natural language understanding
- multimedia systems
- visual data
- video search
- visual similarity
- web images
- text data
- multimedia databases
- natural language processing
- digital libraries
- keywords
- image processing
- computer vision