Incorporating Lip Features into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion.
Ya JiangHang ChenJun DuQing WangChin-Hui LeePublished in: ICASSP (2023)
Keyphrases
- audio visual
- person authentication
- multimodal fusion
- multi modal
- audio features
- visual information
- audio visual speech recognition
- doa estimation
- speaker verification
- multi stream
- emotion recognition
- visual data
- visual speech
- multimedia
- low level
- canonical correlation analysis
- computer vision
- sound source
- visual content
- feature set
- knn
- feature vectors