iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre.
Guangyan ZhangYing QinWenjie ZhangJialun WuMei LiYutao GaiFeijun JiangTan LeePublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2023)
Keyphrases
- speech synthesis
- speech recognition
- prosodic features
- vocal tract
- text to speech
- noisy environments
- automatic speech recognition
- image processing
- speech corpus
- computationally efficient
- language model
- hidden markov models
- control system
- image acquisition
- speaker identification
- speaker verification
- acoustic features
- pattern recognition
- machine learning
- parameter tuning
- robust estimation
- transfer learning
- robust stability