Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection.
Otavio BragaOlivier SiohanPublished in: CoRR (2022)
Keyphrases
- audio visual
- automatic speech recognition
- multi task
- speech recognition
- multi modal
- speaker verification
- noisy environments
- speech signal
- acoustic features
- visual information
- multi class
- learning tasks
- broadcast news
- emotion recognition
- hidden markov models
- visual data
- multimedia
- transfer learning
- audio features
- learning problems
- multiscale
- decision trees
- object recognition
- e learning
- training data
- language model
- data sets
- principal component analysis