Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments.
Guan-Lin ChaoWilliam ChanIan R. LanePublished in: CoRR (2019)
Keyphrases
- speech recognition
- audio visual
- audio visual speech recognition
- acoustic models
- automatic speech recognition
- hidden markov models
- multi stream
- speaker verification
- pattern recognition
- language model
- multi modal
- speaker recognition
- speech signal
- speech synthesis
- noisy environments
- emotion recognition
- visual information
- speech recognizer
- speaker diarization
- speaker dependent
- speaker identification
- digit recognition
- speaker independent
- multimedia
- video sequences