Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition.
Liangfa WeiJie ZhangJunfeng HouLirong DaiPublished in: CoRR (2020)
Keyphrases
- speech recognition
- audio visual
- audio visual speech recognition
- noisy environments
- multi stream
- multi modal
- person authentication
- multimodal fusion
- speaker verification
- hidden markov models
- language model
- speech recognizer
- speech synthesis
- speaker identification
- emotion recognition
- digit recognition
- pattern recognition
- visual data
- visual information
- speech signal
- probabilistic model
- multimedia
- automatic speech recognition
- speech recognition systems
- noise reduction
- action recognition