Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party.
Yifei WuChenda LiSong YangZhongqin WuYanmin QianPublished in: Interspeech (2021)
Keyphrases
- audio visual
- speech recognition
- audio visual speech recognition
- multi modal
- multi stream
- hidden markov models
- visual information
- sound source
- speech signal
- automatic speech recognition
- pattern recognition
- speech synthesis
- multimedia
- language model
- visual data
- noisy environments
- emotion recognition
- speech recognizer
- digit recognition
- speech recognition systems
- audio features
- speaker verification
- machine learning
- image processing
- computer vision