Multi-speaker DoA Estimation Using Audio and Visual Modality.
Yulin WuRuimin HuXiaochen WangShanfa KePublished in: Neural Process. Lett. (2023)
Keyphrases
- audio visual
- visual information
- doa estimation
- multi modal
- cross modal
- visual data
- single modality
- visual speech
- direction of arrival
- speaker identification
- visual features
- sound source
- speech recognition
- low level
- prosodic features
- automatic transcription
- acoustic features
- eye movements
- high dimensional
- multimedia
- speaker diarization