Practice of the Conformer Enhanced Audio-Visual Hubert on Mandarin and English.
Xiaoming RenChao LiShenjian WangBiao LiPublished in: ICASSP (2023)
Keyphrases
- audio visual
- emotion recognition
- multi modal
- broadcast news
- multi stream
- visual data
- visual information
- audio visual speech recognition
- temporal context
- audio features
- natural language
- multimedia
- speaker verification
- person authentication
- speech recognition
- co occurrence
- hidden markov models
- spatio temporal
- text to speech
- pattern recognition
- e learning