An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis.
Beáta LorinczAdriana StanMircea GiurgiuPublished in: CoRR (2021)
Keyphrases
- speech synthesis
- prosodic features
- speech recognition
- objective evaluation
- vocal tract
- speaker verification
- ground truth
- automatic speech recognition
- text to speech
- speech signal
- audio visual
- subjective evaluation
- language model
- pattern recognition
- neural network
- machine learning
- image structure
- low level
- image segmentation
- spontaneous speech