Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention.
Ruijie TaoXinyuan QianYidi JiangJunjie LiJiadong WangHaizhou LiPublished in: CoRR (2024)
Keyphrases
- audio visual
- visual information
- multi modal
- sound source
- visual data
- speaker verification
- temporal context
- emotion recognition
- information extraction
- visual features
- multimedia
- multi stream
- person authentication
- low level
- audio features
- audio visual speech recognition
- visual content
- image database
- feature extraction
- image sequences