The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition.
Zhe WangShilong WuHang ChenMao-Kui HeJun DuChin-Hui LeeJingdong ChenShinji WatanabeSabato Marco SiniscalchiOdette ScharenborgDiyuan LiuBaocai YinJia PanJianqing GaoCong LiuPublished in: CoRR (2023)
Keyphrases
- audio visual
- speech processing
- speaker identification
- visual data
- audio features
- multi modal
- speech recognition
- visual information
- feature extraction
- natural language processing
- multimedia
- object recognition
- pattern recognition
- gaussian mixture model
- noisy environments
- action recognition
- machine learning
- multimedia data
- human activities
- high dimensional
- speech signal
- database systems