End-to-End Multi-Person Audio/Visual Automatic Speech Recognition.
Otavio BragaTakaki MakinoOlivier SiohanHank LiaoPublished in: CoRR (2022)
Keyphrases
- end to end
- audio visual
- automatic speech recognition
- speech recognition
- multi modal
- speech signal
- hidden markov models
- multi stream
- visual information
- visual data
- broadcast news
- multimedia
- emotion recognition
- noisy environments
- speaker verification
- passage retrieval
- pattern recognition
- audio features
- contextual information
- context aware
- image data
- acoustic features
- video sequences