Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding.
Lingyan HuangTao LiHaodong ZhouQingyang HongLin LiPublished in: INTERSPEECH (2023)
Keyphrases
- end to end
- language understanding
- cross modal
- natural language understanding
- semantic interpretation
- multi modal
- natural language
- semantic analysis
- multimedia retrieval
- dialogue system
- language processing
- knowledge representation
- visual recognition
- image retrieval
- knowledge sources
- multimedia databases
- visual similarity
- general knowledge
- natural language processing
- spoken dialogue systems
- semantic concepts
- semantic similarity
- semantic information
- multiple features
- medical domain
- visual features
- text mining
- cognitive psychology
- low level