End-to-End Multi-Person Audio/Visual Automatic Speech Recognition.
Otavio BragaTakaki MakinoOlivier SiohanHank LiaoPublished in: ICASSP (2020)
Keyphrases
- end to end
- audio visual
- automatic speech recognition
- speech recognition
- multi modal
- speech signal
- broadcast news
- hidden markov models
- multimedia
- visual information
- visual data
- acoustic features
- emotion recognition
- audio features
- multi stream
- passage retrieval
- speaker verification
- contextual information
- computer vision
- eye movements
- context aware
- noisy environments
- pattern recognition