Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition.
George SterpuChristian SaamNaomi HartePublished in: CoRR (2018)
Keyphrases
- audio visual
- automatic speech recognition
- noisy environments
- multi modal
- person authentication
- speech recognition
- multimodal fusion
- speech signal
- audio visual speech recognition
- visual information
- speaker verification
- multi stream
- multimedia
- conversational speech
- visual data
- hidden markov models
- emotion recognition
- speech retrieval
- broadcast news
- neural network
- feature extraction
- image processing