Best of Both Worlds: Multi-Task Audio-Visual Automatic Speech Recognition and Active Speaker Detection.
Otavio BragaOlivier SiohanPublished in: ICASSP (2022)
Keyphrases
- audio visual
- automatic speech recognition
- multi task
- speech recognition
- multi modal
- speaker verification
- noisy environments
- learning tasks
- acoustic features
- speech signal
- visual information
- hidden markov models
- audio features
- visual data
- multimedia
- emotion recognition
- broadcast news
- passage retrieval
- learning problems
- multi class
- feature selection
- speech sounds
- data sets
- computer vision
- neural network
- visual features
- denoising
- feature space
- machine learning