Rethinking the visual cues in audio-visual speaker extraction.
Junjie LiMeng GeZexu PanRui CaoLongbiao WangJianwu DangShiliang ZhangPublished in: CoRR (2023)
Keyphrases
- audio visual
- visual cues
- visual information
- visual data
- low level
- visual features
- multi modal
- speaker verification
- emotion recognition
- visual content
- information extraction
- multimedia
- multi stream
- audio features
- semantic information
- audio visual speech recognition
- image collections
- key frames
- eye movements
- image classification
- domain knowledge
- speech recognition
- image retrieval