Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions.
Tejas SrinivasanRamon SanabriaFlorian MetzePublished in: CoRR (2019)
Keyphrases
- speech recognition
- visual context
- noisy environments
- speech signal
- hidden markov models
- automatic speech recognition
- pattern recognition
- temporal context
- semantic context
- language model
- speaker identification
- multi modal
- scene interpretation
- object detection
- speech recognition systems
- video annotation
- audio visual
- low level
- multimedia
- neural network