Audio-Visual Efficient Conformer for Robust Speech Recognition.
Maxime BurchiRadu TimoftePublished in: CoRR (2023)
Keyphrases
- speech recognition
- audio visual speech recognition
- audio visual
- noisy environments
- multi stream
- hidden markov models
- multi modal
- speaker verification
- automatic speech recognition
- speech signal
- visual information
- language model
- pattern recognition
- speech synthesis
- visual data
- speech recognizer
- neural network
- emotion recognition
- digit recognition
- speaker independent
- broadcast news
- speaker identification
- multimedia