Audio-visual Speaker Diarization: Improved Voice Activity Detection with CNN based Feature Extraction.
Konstantinos FanarasAntonios TragoudarasCharalampos AntoniadisYehia MassoudPublished in: MWSCAS (2022)
Keyphrases
- audio visual
- speaker diarization
- feature extraction
- speaker verification
- multi modal
- voice activity detection
- visual information
- visual data
- emotion recognition
- speech recognition
- speaker identification
- noisy environments
- multimedia
- feature set
- image processing
- feature vectors
- data sets
- visual features
- feature space
- computer vision
- machine learning
- neural network