Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
Junjie LiMeng GeZexu PanRui CaoLongbiao WangJianwu DangShiliang ZhangPublished in: INTERSPEECH (2023)
Keyphrases
- audio visual
- visual cues
- visual information
- low level
- speaker verification
- visual features
- multi modal
- visual data
- emotion recognition
- multi stream
- visual content
- eye movements
- image collections
- audio features
- knowledge base
- information extraction
- multimedia
- computer vision
- image representation
- keywords
- similarity measure
- web pages