The XMU System for Audio-Visual Diarization and Recognition in MISP Challenge 2022.
Tao LiHaodong ZhouJie WangQingyang HongLin LiPublished in: ICASSP (2023)
Keyphrases
- audio visual
- multi modal
- visual information
- visual data
- multi stream
- pattern recognition
- audio visual speech recognition
- object recognition
- multimedia
- temporal context
- person authentication
- emotion recognition
- speaker verification
- audio features
- feature extraction
- data sets
- activity recognition
- action recognition
- image features