Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding.
Bhuvan AgrawalMarkus MüllerSamridhi ChoudharyMartin RadfarAthanasios MouchtarisRoss McGowanNathan SusanjSiegfried KunzmannPublished in: ICASSP (2022)
Keyphrases
- language understanding
- end to end
- cross modal
- multi modal
- natural language understanding
- language processing
- multimedia retrieval
- image retrieval
- dialogue system
- visual recognition
- semantic interpretation
- spoken dialogue systems
- dimensionality reduction
- multimedia databases
- general knowledge
- low dimensional
- natural language
- cognitive psychology
- multimedia
- speech acts
- high dimensional
- feature extraction