Lip to Speech Synthesis with Visual Context Attentional GAN.
Minsu KimJoanna HongYong Man RoPublished in: NeurIPS (2021)
Keyphrases
- speech synthesis
- visual context
- speech recognition
- temporal context
- text to speech
- visual attention
- visual scene
- object detection
- scene interpretation
- semantic context
- temporal information
- visual information
- visual words
- pattern recognition
- spatio temporal
- machine learning
- scene understanding
- spatial context
- image processing