Multi-speaker Direction of Arrival Estimation Using Audio and Visual Modalities with Convolutional Neural Network.
Yulin WuRuimin HuXiaochen WangPublished in: ICME (2023)
Keyphrases
- doa estimation
- convolutional neural network
- cross modal
- direction of arrival
- visual data
- audio visual
- visual information
- signal subspace
- multi modal
- single modality
- visual speech
- sound source
- canonical correlation analysis
- face detection
- visual features
- neural network
- speaker identification
- low level
- multimedia
- feature selection
- correlation matrix
- estimation error
- speech recognition
- signal processing
- human visual system
- discriminant analysis
- parameter estimation
- hidden markov models
- video sequences
- image processing