iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre.
Guangyan ZhangYing QinWenjie ZhangJialun WuMei LiYutao GaiFeijun JiangTan LeePublished in: CoRR (2022)
Keyphrases
- speech synthesis
- speech recognition
- prosodic features
- vocal tract
- text to speech
- speaker verification
- noisy environments
- automatic speech recognition
- language model
- control system
- speaker identification
- robotic systems
- emotion recognition
- neural network
- acoustic features
- speech signal
- optimal control
- transfer learning
- computationally efficient
- real time
- robust estimation
- audio visual
- hidden markov models
- pattern recognition
- image processing
- machine learning