Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition.
Danwei CaiWeiqing WangMing LiPublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2022)
Keyphrases
- visual information
- speaker recognition
- audio visual
- speaker verification
- visual features
- gaussian mixture model
- visual data
- speaker identification
- vector quantization
- low level
- visual content
- eye movements
- probabilistic neural network
- neural network
- image collections
- audio features
- speech recognition
- feature extraction
- emotion recognition
- pattern classification
- image search
- computer vision
- search engine
- image compression
- face recognition