Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image.
Shunsuke GotoKotaro OnishiYuki SaitoKentaro TachibanaKoichiro MoriPublished in: INTERSPEECH (2020)
Keyphrases
- face images
- text to speech synthesis
- feature vectors
- human faces
- face recognition
- text to speech
- vector space
- facial features
- facial expressions
- face databases
- speech recognition
- face recognition systems
- face verification
- low resolution
- automatic face
- principal component analysis
- face matching
- face recognition algorithms
- feature extraction
- pose variations
- face representation
- training set
- high resolution
- illumination variations
- feature points
- face model
- face space
- age estimation
- human face recognition
- image set
- input image
- face representation and recognition
- recognizing faces
- feature set
- frontal view
- audio visual
- facial images
- feature space
- face detection
- video sequences
- robust face recognition
- data sets
- face pose
- ar face database