A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection.
Otavio BragaOlivier SiohanPublished in: CoRR (2022)
Keyphrases
- audio visual
- speech recognition
- audio visual speech recognition
- multi modal
- multi stream
- visual information
- speaker verification
- hidden markov models
- automatic speech recognition
- language model
- speech signal
- speech recognizer
- speech synthesis
- pattern recognition
- digit recognition
- visual data
- multimedia
- speaker dependent
- speech recognition systems
- speaker adaptation
- noisy environments
- speaker recognition
- emotion recognition
- speaker identification
- audio features
- speaker independent
- neural network
- image processing
- multimedia data