An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis.
Beáta LorinczAdriana StanMircea GiurgiuPublished in: KES (2021)
Keyphrases
- feature extraction
- speech synthesis
- prosodic features
- speech recognition
- objective evaluation
- vocal tract
- text to speech
- pattern recognition
- speaker verification
- ground truth
- hidden markov models
- subjective evaluation
- audio visual
- automatic speech recognition
- neural network
- image data
- language model
- visual information
- d objects