The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Zhe WangShilong WuHang ChenMao-Kui HeJun DuChin-Hui LeeJingdong ChenShinji WatanabeSabato Marco SiniscalchiOdette ScharenborgDiyuan LiuBaocai YinJia PanJianqing GaoCong LiuPublished in: ICASSP (2023)
Keyphrases
- audio visual
- speech processing
- speaker identification
- audio features
- speech recognition
- visual data
- multi modal
- feature extraction
- visual information
- noisy environments
- natural language processing
- object recognition
- pattern recognition
- gaussian mixture model
- speech signal
- machine learning
- multimedia
- computer vision
- information retrieval
- search engine
- knowledge base
- action recognition
- language model
- metadata
- broadcast news
- multiscale