Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition.
George SterpuChristian SaamNaomi HartePublished in: ICMI (2018)
Keyphrases
- audio visual
- automatic speech recognition
- noisy environments
- speech recognition
- multi modal
- person authentication
- multimodal fusion
- speech signal
- multi stream
- speaker verification
- audio visual speech recognition
- visual information
- broadcast news
- conversational speech
- multimedia
- passage retrieval
- acoustic features
- hidden markov models
- visual data
- high robustness
- speech retrieval
- text mining
- video sequences