VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency.
Ruohan GaoKristen GraumanPublished in: CVPR (2021)
Keyphrases
- cross modal
- visual speech
- multi modal
- hidden markov models
- multimedia retrieval
- visual data
- visual recognition
- image retrieval
- computer vision
- multimedia databases
- metadata
- audio signals
- noisy environments
- feature extraction
- multimedia
- speech recognition
- image classification
- feature space
- multiscale
- sound source
- speaker identification