The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
Shilong WuChenxi WangHang ChenYusheng DaiChenyue ZhangRuoyu WangHongbo LanJun DuChin-Hui LeeJingdong ChenShinji WatanabeSabato Marco SiniscalchiOdette ScharenborgZhong-Qiu WangJia PanJianqing GaoPublished in: CoRR (2023)
Keyphrases
- audio visual
- speech processing
- visual data
- speech recognition
- multi modal
- speaker identification
- audio features
- visual information
- natural language processing
- multimedia
- machine learning
- high dimensional
- high dimensional data
- contextual information
- principal component analysis
- nearest neighbor
- image retrieval
- information retrieval