The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge.
Ming ChengHaoxu WangZiteng WangQiang FuMing LiPublished in: ICASSP (2023)
Keyphrases
- audio visual
- speaker diarization
- speaker verification
- multi modal
- visual information
- multi stream
- visual data
- emotion recognition
- multimedia
- broadcast news
- speech recognition
- low level
- data sets
- visual features
- audio features
- model selection
- principal component analysis
- high level
- metadata
- search engine
- machine learning