Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition.
Liangfa WeiJie ZhangJunfeng HouLirong DaiPublished in: APSIPA (2020)
Keyphrases
- speech recognition
- audio visual
- audio visual speech recognition
- noisy environments
- multi modal
- person authentication
- multi stream
- hidden markov models
- speech synthesis
- multimodal fusion
- language model
- pattern recognition
- automatic speech recognition
- visual information
- speech signal
- multimedia
- emotion recognition
- speaker verification
- speech recognition systems
- visual data
- speaker identification
- search engine
- feature selection
- visual features
- natural language processing
- speech recognizer
- digit recognition
- mobile devices
- machine learning