Look, listen, and decode: Multimodal speech recognition with images.
Felix SunDavid F. HarwathJames R. GlassPublished in: SLT (2016)
Keyphrases
- speech recognition
- hidden markov models
- noisy environments
- image analysis
- language model
- automatic speech recognition
- speech signal
- speech recognizer
- speech synthesis
- speech processing
- multi modal
- keyword spotting
- handwriting recognition
- image classification
- speech recognition technology
- image collections
- background noise
- speaker identification
- speaker recognition
- speech retrieval
- neural network
- isolated word
- speaker independent
- visual data
- low level
- pattern recognition